Neural network for generating synthetic medical images

ABSTRACT

Systems, computer-implemented methods, and computer readable media for generating a synthetic image of an anatomical portion based on an origin image of the anatomical portion acquired by an imaging device using a first imaging modality are disclosed. These systems may be configured to receive the origin image of the anatomical portion acquired by the imaging device using the first imaging modality, receive a convolutional neural network model trained for predicting the synthetic image based on the origin image, and convert the origin image to the synthetic image through the convolutional neural network model. The synthetic image may resemble an imaging of the anatomical portion using a second imaging modality differing from the first imaging modality.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a U.S. National Stage Filing under 35 U.S.C. 371from International Application No. PCT/US2017/042136, filed on Jul. 14,2017, and published as WO2018/048507 on Mar. 15, 2018, which claims thebenefit of priority from U.S. Provisional Patent Application No.62/384,171 filed on Sep. 6, 2016 and U.S. Provisional Patent ApplicationNo. 62/408,676 filed on Oct. 14, 2016, the benefit of priority of eachof which is hereby claimed herein, and which applications andpublication are hereby incorporated herein by reference in theirentirety.

TECHNICAL FIELD

This disclosure generally relates to generating synthetic images usingmachine learning algorithms for use in radiation therapy. Morespecifically, this disclosure relates to systems and methods forgenerating computed tomography (CT) images from Magnetic ResonanceImaging (MRI) images using neural networks.

BACKGROUND

CT imaging has been traditionally used as the primary source of imagedata in the planning process of external radiation therapy. CT imagesoffer accurate representation of patient geometry, and CT values can bedirectly converted to electron densities for radiation dose calculation.However, because conventional CT images are generated through use of animaging radiation source, using CT may expose a patient to additionalradiation dosages. In recent years, interest in replacing CT images withMRI image in the radiotherapy treatment planning process has emerged.This is due to the fact that MRI is free of ionizing radiation andprovides a superior soft tissue contrast for more accurate target andstructure delineation. MRI also captures functional information of thehuman body, such as tissue metabolism and functionality. However, MRIintensity values are not directly related to electron densities andconventional MRI sequences cannot obtain MRI signal from bone.Therefore, it is beneficial if a “pseudo CT image” or a “syntheticimage,” such as a pseudo or synthetic CT image (referred to herein as a“synthetic CT image”), could be derived from an acquired MRI image.

A synthetic CT image derived from an MRI image can be used to facilitatepatient dose computation in radiation therapy treatment planning or togenerate digitally-reconstructed radiographs for image guidance. Thesynthetic CT image may also be used for patient positioning duringradiotherapy. Therefore, it is desirable to accurately generate asynthetic CT image using MRI image data in order for patients to bespared from additional radiation exposure arising from CT imaging.Preferably, a synthetic CT image would accurately resemble a “real” CTimage that is acquired by a CT scanner. In other words, a synthetic CTshould be as close as possible to the real CT image, e.g., in terms ofrelative voxel intensity values and contrast features. Generatingaccurate synthetic CT image is not a simple task because there does notexist a direct mathematical relationship between CT image intensityvalues (CT numbers) and MRI intensity values.

Existing methods for automatic synthetic CT image generation can beroughly classified into three categories: tissue classification-basedapproaches, atlas-based approaches, and model-based approaches. Tissueclassification or tissue segmentation methods first classify the MRIimage into regions of different tissue types (e.g., air, fat, softtissue, or bone), and then use bulk density assignments to assign adifferent CT number for each tissue type. Automatic tissue segmentationis a difficult problem, especially if using traditional MRI images. Forinstance, bone and air are inseparable or indistinguishable intraditional MRI images. Thus, more sophisticated MRI sequences (e.g.,MRI images capable of acquiring images of a higher resolution) are oftenneeded for more accurate tissue segmentation. Even if tissuesegmentation is possible, dividing the anatomy into few tissue typesprovides a very coarse approximation of the actual anatomy. For example,prostate and bladder are both soft tissues, but their CT intensities areusually different.

Atlas-based approaches are also known as registration-based approaches.In the context of synthetic CT generation, the term “atlas” often refersto a dataset comprising paired atlas MRI and CT images. In an example,the pairing of MRI and CT images can be performed by image registration.To generate a synthetic CT image from a given original MRI image, anatlas-based method first aligns each atlas MRI image from the “atlas” tothe original MRI image through computing an image registrationtransformation (usually deformable or nonlinear image registration isneeded, instead of a simple linear registration). The computedregistration transformation can then be applied to map the correspondingatlas CT image to the original MRI image, and generate a deformed atlasCT. The deformed atlas CT image provides an estimation of the real CTimage (e.g., a synthetic CT image).

Normally, to achieve better accuracy, multiple deformed atlas CTs arecomputed, and the multiple deformed atlas CTs can be combined together,such as by averaging, to generate a final synthetic CT imagecorresponding to the original MRI image. However, most existing imageregistration methods can handle only single-channel MRI images. Also, ifa large number of atlases are used, because the original MRI image hasto be registered to each atlas MRI image, the computation time may beincreased. The process is slow because these techniques requirecontinuous access to the full atlas database, for the image registrationoperations involve all (atlas selection—can use less than all theimages) the atlas MRI images and atlas CT images. In an example, atlasselection can be used to select a subset of all the atlas MRI images andatlas CT images.

Model-based or learning-based approaches aim to build a predictive modelfrom available training data. Once the predictive model has beentrained, it can then be applied to any new MRI image of the same type(e.g., any new MRI image acquired using an MRI sequence that is the sameas that used to acquire the training data.) in order to predict acorresponding CT image. The training process typically uses a supervisedlearning algorithm. Specifically, a supervised learning algorithm uses aknown set of input data and known responses or outputs to that data, andthen trains a model to generate reasonable predictions for the responseto new data. For synthetic CT prediction, a regression (instead ofclassification) model is needed since the response variables (e.g., theCT values) are real numbers. Existing model-based approaches andtraditional learning algorithms also require extraction of certain typesof “features” to use as inputs to the model. Existing methods differ inthe type of features each method may use as an input to its regressionmodel. These existing methods also require manually designed features.Furthermore, existing methods predict the CT number of a single voxel.Thus, these methods must be applied in a sliding window fashion (e.g.,by applying a model to a series of sub-regions of an image), orvoxel-by-voxel fashion, to generate a complete synthetic CT image. Suchsliding window approaches often require significant computing time andresources in order to generate a complete synthetic CT image.

Accordingly, there is a need for new systems and methods usingartificial intelligence techniques to generate synthetic CT images fromother images, such as MRI images.

SUMMARY

In an aspect, the disclosure can feature a computer-implemented methodfor generating a synthetic image of an anatomical portion based on anorigin image of the anatomical portion acquired by an imaging deviceusing a first imaging modality. The method can include receiving theorigin image of the anatomical portion acquired by the imaging deviceusing the first imaging modality, and receiving a convolutional neuralnetwork model trained for predicting the synthetic image based on theorigin image. The method can also include converting, by at least oneprocessor, the origin image to the synthetic image through theconvolutional neural network model. The synthetic image can resemble asecond imaging modality depiction of the anatomical portion wherein thesecond imaging modality can differ from the first imaging modality. Thefirst imaging modality can be Magnetic Resonance Imaging, and the secondimaging modality can be Computed Tomography. The first imaging modalityand the second imaging modality are selected from the group of MagneticResonance Imaging, Computed Tomography, ultrasound imaging, PositronEmission Tomography, and Single-Photon Emission Computed Tomography. Themethod can also include receiving a plurality of training origin imagesacquired using the first imaging modality, receiving a plurality oftraining destination images acquired using the second imaging modality,wherein each training destination image can correspond to a trainingorigin image, determining a convolutional neural network architecture,and training the convolutional neural network model using the trainingorigin images and corresponding training destination images. The originimage can be a two-dimensional image and the synthetic image can be atwo-dimensional image. The origin image can include a stack oftwo-dimensional images and the synthetic image can include acorresponding stack of two-dimensional images. The origin image can be athree-dimensional volume and the synthetic image can be athree-dimensional volume. The origin image can be a three-dimensionalimage, and generating the synthetic image can include selecting aplurality of stacks of adjacent two-dimensional images from thethree-dimensional image, converting each stack of adjacenttwo-dimensional images to a stack of synthetic two-dimensional imagesusing the convolutional neural network model, and determining, by the atleast one processor, the synthetic image by aggregating the stacks ofsynthetic two-dimensional images. The origin image can be athree-dimensional image, and generating the synthetic image can includecreating a first stack of two-dimensional images from a first plane ofthe three-dimensional image and a second stack of two-dimensional imagesfrom a second plane of the three-dimensional image, converting the firststack and the second stack of two-dimensional images to a first stackand a second stack of synthetic two-dimensional images using theconvolutional neural network model, and determining the synthetic imageby aggregating the first stack and the second stack of synthetictwo-dimensional images. The adjacent two-dimensional images can be inthe same plane and carry dependent structure information in an axisorthogonal to the plane. The plurality of stacks of adjacenttwo-dimensional images can partially overlap, and aggregating the stacksof synthetic two-dimensional images can include averaging overlappingsynthetic two-dimensional images. Converting the first stack oftwo-dimensional images to the first stack of synthetic two-dimensionalimages can use a first convolutional neural network model, andconverting the second stack of two-dimensional images to the secondstack of synthetic two-dimensional images can use a second convolutionalneural network model. The first convolutional neural network model andthe second convolutional neural network model can correspond to twodifferent anatomical planes. The origin image can include multi-channelimages of the anatomical portion acquired using different acquisitionsettings or using different acquisition channels of the imaging device.The multi-channel images can include T1-weighted Magnetic Resonanceimages and T2-weighted Magnetic Resonance images. The convolutionalneural network model can include a first component configured todetermine a feature map of the origin image and a second componentconfigured to determine the synthetic image from the feature map. Thefirst component can include a plurality of encoding layers and thesecond component includes a plurality of decoding layers. The firstcomponent can be configured to reduce the size of the feature mapthrough down-sampling, and the second component can be configured toincrease the size of the feature map through up-sampling. Theconvolutional neural network can include a plurality of convolutionallayers, wherein the set of model parameters can include learnable filterweights used by the plurality of convolutional layers. The convolutionalneural network can include 10 to 500 layers. The convolutional neuralnetwork model can include converting the training origin images tosynthetic images using the convolutional neural network model,determining a difference between the synthetic images and thecorresponding training destination images, and updating the set of modelparameters of the convolutional neural network model based on thedifference. The training can be completed when a difference between thesynthetic image and the destination image is smaller than apredetermined threshold. The difference can be measured by a lossfunction calculated based on the synthetic image and the destinationimage.

In an aspect, the disclosure can feature a system for generating asynthetic image of an anatomical portion based on an origin image of theanatomical portion acquired by an imaging device using a first imagingmodality. The system can include an input interface configured toreceive the origin image of the anatomical portion acquired by theimaging device using the first imaging modality, and to receive aconvolutional neural network model trained for predicting the syntheticimage based on the origin image. The system can also include at leastone storage device configured to store the origin image and theconvolutional neural network model. The system can also include an imageprocessor configured to convert the origin image to the synthetic imagethrough the convolutional neural network model. The synthetic image(e.g., a pseudo-CT image) can be stored in the at least one storagedevice. The synthetic image can resemble a second imaging modalitydepiction of the anatomical portion, wherein the second imaging modality(e.g., a CT) can differ from the first imaging modality. The originimage can include multi-channel images of the anatomical portionacquired using different acquisition settings or using differentacquisition channels of the imaging device. The first imaging modalitycan be Magnetic Resonance Imaging, and the second imaging modality canbe Computed Tomography. The first imaging modality and the secondimaging modality can be selected from among the group of MagneticResonance Imaging, Computed Tomography, ultrasound imaging, PositronEmission Tomography, and Single-Photon Emission Computed Tomography. Theinput interface can be further configured to receive a plurality oftraining origin images acquired using the first imaging modality,receive a plurality of training destination images acquired using thesecond imaging modality, wherein each training destination image cancorrespond to a training origin image. The image processor can befurther configured to train the convolutional neural network model usingthe training origin images and corresponding training destinationimages. The origin image can be a two-dimensional image and thesynthetic image can be a two-dimensional image. The origin image caninclude a stack of two-dimensional image and the synthetic image caninclude a corresponding stack of two-dimensional images. The originimage can be a three-dimensional volume and the synthetic image can be athree-dimensional volume. The origin image can be a 3D image and theimage processor can be further configured to select a plurality ofstacks of adjacent two-dimensional images from the 3D image, converteach stack of adjacent two-dimensional images to a stack of synthetictwo-dimensional images using the convolutional neural network model,determine the synthetic image by aggregating the stacks of synthetictwo-dimensional images. The adjacent two-dimensional images can be inthe same plane and can carry dependent structure information in an axisorthogonal to the plane. The plurality of stacks of adjacenttwo-dimensional images can partially overlap, and aggregating the stacksof synthetic two-dimensional images can include averaging overlappingsynthetic two-dimensional images. The origin image can be athree-dimensional image and the image processor can be furtherconfigured to create a first stack of two-dimensional images from afirst plane of the three-dimensional image and a second stack oftwo-dimensional images from a second plane of the three-dimensionalimage, convert the first stack and the second stack of two-dimensionalimages to a first stack and a second stack of synthetic two-dimensionalimages using the convolutional neural network model, and determine thesynthetic image by aggregating the first stack and the second stack ofsynthetic two-dimensional images. The image processor can be configuredto convert the first stack of two-dimensional images to the first stackof synthetic two-dimensional images using a first convolutional neuralnetwork model, and convert the second stack of two-dimensional images tothe second stack of synthetic two-dimensional images using a secondconvolutional neural network model, wherein the first convolutionalneural network model and the second convolutional neural network modelcan correspond to two different anatomical planes. The origin image caninclude multi-channel images of the anatomical portion acquired usingdifferent acquisition channels of the imaging device. The multi-channelimages can include T1-weighted Magnetic Resonance images and T2-weightedMagnetic Resonance images. The convolutional neural network model caninclude a first component configured to determine a feature map of theorigin image and a second component configured to determine thesynthetic image from the feature map. The first component can include aplurality of encoding layers and the second component includes aplurality of decoding layers. The first component can be configured toreduce the size of the feature map through down-sampling, and the secondcomponent can be configured to increase the size of the feature mapthrough up-sampling. The convolutional neural network can include aplurality of convolutional layers, wherein the set of model parameterscan include learnable filter weights used by the plurality ofconvolutional layers. The convolutional neural network can include 10 to500 layers. The image processor can be further configured to convert thetraining origin images to synthetic images using the convolutionalneural network model, determine a difference between the syntheticimages and the corresponding training destination images, and update theset of model parameters based on the difference. The difference can bemeasured by a loss function calculated based on the synthetic images andthe corresponding training destination images.

In an aspect, the disclosure can feature a non-transitorycomputer-readable medium containing instructions that, when executed byat least one processor, cause the at least one processor to perform amethod for generating a synthetic image of an anatomical portion basedon an origin image of the anatomical portion acquired by an imagingdevice using a first imaging modality. The method can include receivingthe origin image of the anatomical portion acquired by the imagingdevice using the first imaging modality, and receiving a convolutionalneural network model trained for predicting the synthetic image based onthe origin image. The method can also include converting the originimage to the synthetic image through the convolutional neural networkmodel. The synthetic image can resemble a second imaging modalitydepiction of the anatomical portion, wherein the second imaging modality(e.g., a CT) can differ from the first imaging modality. The method canalso include receiving a plurality of training origin images acquiredusing the first imaging modality, receiving a plurality of trainingdestination images acquired using the second imaging modality, eachtraining destination image corresponding to a training origin image,determining a convolutional neural network architecture, and trainingthe convolutional neural network model using the training origin imagesand corresponding training destination images. The origin image can be athree-dimensional image, and generating the synthetic image can furtherinclude selecting a plurality of stacks of adjacent two-dimensionalimages from the three-dimensional image, converting each stack ofadjacent two-dimensional images to a stack of synthetic two-dimensionalimages using the convolutional neural network model, and determining, bythe at least one processor, the synthetic image by aggregating thestacks of synthetic two-dimensional images. The adjacent two-dimensionalimages can be in the same plane and carry dependent structureinformation in an axis orthogonal to the plane. The plurality of stacksof adjacent two-dimensional images can partially overlap, andaggregating the stacks of synthetic two-dimensional images can includeaveraging overlapping synthetic two-dimensional images. The origin imagecan be a three-dimensional image, and generating the synthetic image caninclude creating a first stack of two-dimensional images from a firstplane of the three-dimensional image and a second stack oftwo-dimensional images from a second plane of the three-dimensionalimage, converting the first stack and the second stack oftwo-dimensional images to a first stack and a second stack of synthetictwo-dimensional images using the convolutional neural network model, anddetermining the synthetic image by aggregating the first stack and thesecond stack of synthetic two-dimensional images. Converting the firststack of two-dimensional images to the first stack of synthetictwo-dimensional images can include using a first convolutional neuralnetwork model, and converting the second stack of two-dimensional imagesto the second stack of synthetic two-dimensional images can includeusing a second convolutional neural network model. The firstconvolutional neural network model and the second convolutional neuralnetwork model can correspond to two different anatomical planes. Theorigin image can include multi-channel images of the anatomical portionacquired using different acquisition channels of the imaging device. Themulti-channel images can include T1-weighted Magnetic Resonance imagesand T2-weighted Magnetic Resonance images. The convolutional neuralnetwork model can include a first component configured to determine afeature map of the origin image and a second component configured todetermine the synthetic image from the feature map. The first componentcan include a plurality of encoding layers and the second component caninclude a plurality of decoding layers. The first component can beconfigured to reduce the size of the feature map through down-sampling,and the second component can be configured to increase the size of thefeature map through up-sampling. The convolutional neural network caninclude a plurality of convolutional layers, wherein the set of modelparameters can include learnable filter weights used by the plurality ofconvolutional layers. The convolutional neural network can include 10 to500 layers. Training the set of model parameters of the convolutionalneural network model can include converting the training origin imagesto synthetic images using the convolutional neural network model,determining a difference between the synthetic images and thecorresponding training destination images, and updating the set of modelparameters based on the difference. The difference can be measured by aloss function calculated based on the synthetic images and thecorresponding training destination images.

In an aspect, the disclosure can feature a computer-implemented methodfor generating a convolutional neural network model for predicting asynthetic image of an anatomical portion based on an origin image of theanatomical portion acquired by an imaging device using a first imagingmodality. The method can include receiving a plurality of trainingorigin images acquired using the first imaging modality. The method canalso include receiving a plurality of training destination imagesacquired using a second imaging modality, each training destinationimage corresponding to a training origin image. The method can alsoinclude training, by at least one processor, a set of model parametersof the convolutional neural network model using the training originimages and corresponding training destination images. The first imagingmodality can be Magnetic Resonance Imaging, and the second imagingmodality can be Computed Tomography. The first imaging modality and thesecond imaging modality can be selected from among the group of MagneticResonance Imaging, Computed Tomography, ultrasound imaging, PositronEmission Tomography, and Single-Photon Emission Computed Tomography. Thetraining origin images can include multi-channel images of theanatomical portion acquired using different acquisition channels of theimaging device. The multi-channel images can include T1-weightedMagnetic Resonance images and T2-weighted Magnetic Resonance images. Theconvolutional neural network model can include a first componentconfigured to determine a feature map of the origin image and a secondcomponent configured to determine the synthetic image from the featuremap. The first component can include a plurality of encoding layers andthe second component includes a plurality of decoding layers. The firstcomponent can be configured to reduce the size of the feature mapthrough down-sampling, and the second component can be configured toincrease the size of the feature map through up-sampling. Theconvolutional neural network can include a plurality of convolutionallayers, and the set of model parameters can include learnable filterweights used by the plurality of convolutional layers. The convolutionalneural network can include 10 to 500 layers. Training the set of modelparameters of the convolutional neural network model can includeconverting the training origin images to synthetic images using theconvolutional neural network model, determining a difference between thesynthetic images and the corresponding training destination images, andupdating the set of model parameters based on the difference. Thedifference can be measured by a loss function calculated based on thesynthetic images and the corresponding training destination images.

Additional objects and advantages of the embodiments will be set forthin part in the description that follows, and in part will be obviousfrom the description, or may be learned by practice of the embodiments.It is to be understood that both the foregoing general description andthe following detailed description are exemplary and explanatory onlyand are not restrictive of the claims.

As used herein, the terms “comprises,” “comprising,” or any othervariation thereof, are intended to cover a non-exclusive inclusion, suchthat a process, method, article, or apparatus that comprises a list ofelements does not include only those elements, but may include otherelements not expressly listed or inherent to such process, method,article, or apparatus.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are incorporated in and constitute apart of this specification, illustrate the disclosed embodiments and,together with the description, serve to explain the disclosedprinciples. In the drawings:

FIG. 1A illustrates an exemplary radiotherapy system, according to someembodiments of the present disclosure.

FIG. 1B illustrates an exemplary radiation therapy system that caninclude radiation therapy output configured to provide a therapy beam.

FIG. 1C illustrates an exemplary system including a combined radiationtherapy system and an imaging system, such as a computed tomography (CT)imaging system.

FIG. 1D illustrates a partially cut-away view of an exemplary systemincluding a combined radiation therapy system and an imaging system,such as a nuclear magnetic resonance (MR) imaging system.

FIG. 2 illustrates an orthogonal view of an exemplary multi-leafcollimator configuration for shaping, directing, or modulating anintensity of a radiation therapy beam.

FIG. 3 illustrates an exemplary Gamma knife radiation therapy system.

FIGS. 4A and 4B depict the differences between an exemplary MRI imageand a corresponding CT image.

FIG. 4C depicts exemplary synthetic CT images generated from acquiredMRI images.

FIG. 5 depicts an exemplary image conversion system, according to someembodiments of the present disclosure.

FIG. 6 illustrates an exemplary image pre-processing process, accordingto some embodiments of the present disclosure.

FIG. 7A illustrates the operation of an exemplary convolutional layer ofthe predictive model shown in FIG. 5, according to some embodiments ofthe present disclosure.

FIG. 7B illustrates the operation of an exemplary stack of convolutionallayers of the predictive model shown in FIG. 5, according to someembodiments of the present disclosure.

FIG. 7C illustrates the operation of exemplary max pooling and unpoolinglayers of the predictive model shown in FIG. 5, according to someembodiments of the present disclosure.

FIGS. 7D and 7E depict an exemplary convolutional neural network, usedby the image conversion system shown in FIG. 5, according to someembodiments of the present disclosure.

FIG. 7F depicts exemplary feature maps generated by selected layers ofthe neural network shown in FIGS. 7D and 7E, according to someembodiments of the present disclosure.

FIG. 8 is a flowchart illustrating an exemplary training process fortraining a CNN model, according to some embodiments of the presentdisclosure.

FIG. 9A is a flowchart illustrating an exemplary image conversionprocess using a trained CNN model obtained through the process of FIG.8, according to some embodiments of the present disclosure.

FIG. 9B is a flowchart illustrating an exemplary 3D image conversionprocess using at least one trained CNN model obtained through theprocess of FIG. 8, according to some embodiments of the presentdisclosure.

DETAILED DESCRIPTION

Reference will now be made in detail to the exemplary embodiments withreference to the accompanying drawings. Wherever possible, the samereference numbers will be used throughout the drawings to refer to sameor like parts. In the figures, the left-most digit(s) of a referencenumber identifies the figure in which the reference number firstappears. Wherever convenient, the same reference numbers are usedthroughout the drawings to refer to the same or like parts. Whileexamples and features of disclosed principles are described herein,modifications, adaptations, and other implementations are possiblewithout departing from the scope of the disclosed embodiments. Also, thewords “comprising,” “having,” “containing,” and “including,” and othersimilar forms are intended to be equivalent in meaning and beinterpreted as open ended, in that, an item or items following any oneof these words is not meant to be an exhaustive listing of such item oritems, or meant to be limited to only the listed item or items. And thesingular forms “a,” “an,” and “the” are intended to include pluralreferences, unless the context clearly dictates otherwise. The term“exemplary” is used in the sense of “example,” rather than “ideal.”

The disclosed embodiments use limited data to initialize theconvolutional neural network using parameters of existing neuralnetworks trained for image classification, taking advantage of theprinciple of transfer learning. The disclosed embodiments advantageouslymay not require inter-subject image registration, either linear ordeformable, and may directly learn the mapping between the MRI imagesand the corresponding CT images. The disclosed embodiments mayadvantageously offer better accuracy than an atlas-based method usingpatch refinement and fusion. This is because the atlas-based methodusually relies on patch comparison to find similar atlas candidates, asalso common in other atlas- or patch-based methods proposed in theliterature. A small, local patch has limited image information.Therefore, using raw image intensities of a patch as features may sufferfrom large redundancy in the data and reduce the discrimination power.In contrast, the disclosed embodiments may advantageously automaticallylearn a hierarchy of image features at different scales and complexitylevels from a full image.

One or more methods consistent with the disclosed embodiments have beentested on 18 patients, and the results compare favorably with otherreported results in the literature, such as results obtained usingexisting fuzzy c-means clustering methods, methods using atlases forregistration, methods using a Gaussian mixture regression model, andatlas registration followed by patch fusion. In addition to improvingaccuracy, the disclosed embodiments also result in a shorter computationtime at deployment. For example, even if training a model (e.g., a deepconvolutional neural network (DCNN)) can take days, the training onlyneeds to be done once and acceleration of the training process ispossible through the use of multiple GPUs, cloud computers, orsuper-computers. Applying the trained model to create synthetic imagesfor a new patient may take only a few seconds on a single GPU. Incomparison, existing model-based or atlas-based methods can take muchlonger (for example, minutes or hours).

The disclosed embodiment can also accommodate a large quantity (hundredsof pairs of origin and destination images) of training data. Neuralnetworks can greatly benefit from large amounts of data due to highmodel capacity. Thus, the accuracy of the disclosed embodiments mayincrease as additional training data becomes available. Increasing theamount of training data may increase training time, but the size of thefinal neural network and the synthetic image generation time may remainthe same. In contrast, increasing the training data using existingatlas-based methods or other model-based methods (such as existingGaussian process models) may be impractical, because these existingmethods require retention of all training data. In addition, thecomputation time of these methods is often directly proportional to thenumber of atlases used. Furthermore, the accuracy of atlas-based methodscan quickly saturate, and, as a result, an atlas selection procedure isoften required to avoid degradation in accuracy when using a largenumber of atlases. However, the use of increased training data mayincrease the accuracy of the final synthetic image generated by thetrained neural network model.

FIG. 1A illustrates an exemplary radiotherapy system 10 for providingradiation therapy to a patient. The radiotherapy system 10 includes animage processing device, 12. The image processing device 12 may beconnected to a network 20. The network 20 may be connected to theInternet 22. The network 20 can connect the image processing device 12with one or more of a database 24, a hospital database 26, an oncologyinformation system (OIS) 28, a radiation therapy device 30, an imageacquisition device 32, a display device 34, and a user interface 36. Theimage processing device 12 is configured to generate radiation therapytreatment plans 16 to be used by the radiation therapy device 30.

The image processing device 12 may include a memory device 16, aprocessor 14 and a communication interface 18. The memory device 16 maystore computer-executable instructions, such as an operating system 43,a radiation therapy treatment plans 42 (e.g., original treatment plans,adapted treatment plans and the like), software programs 44 (e.g.,artificial intelligence, deep learning, neural networks, radiotherapytreatment plan software), and any other computer-executable instructionsto be executed by the processor 14. In an embodiment the softwareprograms 44 may be convert medical images of one format (e.g., MRI) toanother format (e.g., CT) by producing synthetic images, such as apseudo-CT image. For instance, the software programs 44 may includeimage processing programs to train a predictive model for converting amedial image 46 in one modality (e.g., an MRI image) into a syntheticimage of a different modality (e.g., a pseudo CT image); alternatively,the trained predictive model may convert a CT image into an MRI image.The memory device 16 may store data, including medical images 46,patient data 45, and other data required to create and implement aradiation therapy treatment plan 42.

In addition to the memory 16 storing the software programs 44, it iscontemplated that software programs 44 may be stored on a removablecomputer medium, such as a hard drive, a computer disk, a CD-ROM, a DVD,a HD, a Blu-Ray DVD, USB flash drive, a SD card, a memory stick, or anyother suitable medium; and the software programs 44 when downloaded toimage processing device 14 may be executed by image processor 14.

The processor 14 may be communicatively coupled to the memory device 16,and the processor 14 may be configured to execute computer executableinstructions stored thereon. The processor 14 may send or receivemedical images 46 to memory 16. For example, the processor 14 mayreceive medical images 46 from the image acquisition device 32 via thecommunication interface 18 and network 18 to be stored in memory 16. Theprocessor 14 may also send medical images 46 stored in memory 16 via thecommunication interface 18 to the network 20 be either stored indatabase 24 or the hospital database 26.

Further, the processor 14 may utilize software programs 44 (e.g., atreatment planning software) along with the medical images 46 andpatient data 45 to create the radiation therapy treatment plan 42.Medical images 46 may include information such as imaging dataassociated with a patient anatomical region, organ, or volume ofinterest segmentation data. Patient data 45 may include information suchas (1) functional organ modeling data (e.g., serial versus parallelorgans, appropriate dose response models, etc.); (2) radiation dosagedata (e.g., dose-volume histogram (DVH) information; or (3) otherclinical information about the patient and course of treatment (e.g.,other surgeries, chemotherapy, previous radiotherapy, etc.).

In addition, the processor 14 may utilize software programs to generateintermediate data such as updated parameters to be used, for example, bya neural network model; or generate intermediate 2D or 3D image, whichmay then subsequently be stored in memory 16. The processor 14 maysubsequently then transmit the executable radiation therapy treatmentplan 42 via the communication interface 18 to the network 20 to theradiation therapy device 30, where the radiation therapy plan will beused to treat a patient with radiation. In addition, the processor 14may execute software programs 44 to implement functions such as imageconversion, image segmentation, deep learning, neural networks, andartificial intelligence. For instance, the processor 14 may executesoftware programs 44 that train or contour a medical image; suchsoftware 44 when executed may train a boundary detector, utilize a shapedictionary.

The processor 14 may be a processing device, include one or moregeneral-purpose processing devices such as a microprocessor, a centralprocessing unit (CPU), a graphics processing unit (GPU), an acceleratedprocessing unit (APU), or the like. More particularly, the processor 14may be a complex instruction set computing (CISC) microprocessor, areduced instruction set computing (RISC) microprocessor, a very longinstruction Word (VLIW) microprocessor, a processor implementing otherinstruction sets, or processors implementing a combination ofinstruction sets. The processor 14 may also be implemented by one ormore special-purpose processing devices such as an application specificintegrated circuit (ASIC), a field programmable gate array (FPGA), adigital signal processor (DSP), a System on a Chip (SoC), or the like.As would be appreciated by those skilled in the art, in someembodiments, the processor 14 may be a special-purpose processor, ratherthan a general-purpose processor. The processor 14 may include one ormore known processing devices, such as a microprocessor from thePentium™, Core™, Xeon™, or Itanium® family manufactured by Intel™, theTurion™, Athlon™, Sempron™, Opteron™, FX™, Phenom™ family manufacturedby AMD™, or any of various processors manufactured by Sun Microsystems.The processor 14 may also include graphical processing units such as aGPU from the GeForce®, Quadro®, Tesla® family manufactured by Nvidia™,GMA, Iris™ family manufactured by Intel™, or the Radeon™ familymanufactured by AMD™. The processor 14 may also include acceleratedprocessing units such as the Desktop A-4(6,8) Series manufactured byAMD™, the Xeon Phi™ family manufactured by Intel™. The disclosedembodiments are not limited to any type of processor(s) otherwiseconfigured to meet the computing demands of identifying, analyzing,maintaining, generating, and/or providing large amounts of data ormanipulating such data to perform the methods disclosed herein. Inaddition, the term “processor” may include more than one processor, forexample, a multi-core design or a plurality of processors each having amulti-core design. The processor 14 can execute sequences of computerprogram instructions, stored in memory 16, to perform variousoperations, processes, methods that will be explained in greater detailbelow.

The memory device 16 can store medical images 46. In some embodiments,the medical images 46 may include one or more MRI image (e.g., 2D MRI,3D MRI, 2D streaming MRI, 4D MRI, 4D volumetric MRI, 4D cine MRI, etc),functional MRI images (e.g., fMRI, DCE-MRI, diffusion MRI), ComputedTomography (CT) images (e.g., 2D CT, Cone beam CT, 3D CT, 4D CT),ultrasound images (e.g., 2D ultrasound, 3D ultrasound, 4D ultrasound),Positron Emission Tomography (PET) images, X-ray images, fluoroscopicimages, radiotherapy portal images, Single-Photo Emission ComputedTomography (SPECT) images, computer generated synthetic images (e.g.,pseudo-CT images) and the like. Further, the medical images 46 may alsoinclude medical image data, for instance, training images, and groundtruth images, contoured images. In an embodiment, the medical images 46may be received from the image acquisition device 32. Accordingly, imageacquisition device 32 may include a MRI imaging device, a CT imagingdevice, a PET imaging device, an ultrasound imaging device, afluoroscopic device, a SPECT imaging device, an integrated LinearAccelerator and MRI imaging device, or other medical imaging devices forobtaining the medical images of the patient. The medical images 46 maybe received and stored in any type of data or any type of format thatthe image processing device 12 may use to perform operations consistentwith the disclosed embodiments. The memory device 12 may be anon-transitory computer-readable medium, such as a read-only memory(ROM), a phase-change random access memory (PRAM), a static randomaccess memory (SRAM), a flash memory, a random access memory (RAM), adynamic random access memory (DRAM) such as synchronous DRAM (SDRAM), anelectrically erasable programmable read-only memory (EEPROM), a staticmemory (e.g., flash memory, flash disk, static random access memory) aswell as other types of random access memories, a cache, a register, acompact disc read-only memory (CD-ROM), a digital versatile disc (DVD)or other optical storage, a cassette tape, other magnetic storagedevice, or any other non-transitory medium that may be used to storeinformation including image, data, or computer executable instructions(e.g., stored in any format) capable of being accessed by the processor14, or any other type of computer device. The computer programinstructions can be accessed by the processor 14, read from the ROM, orany other suitable memory location, and loaded into the RAM forexecution by the processor 14. For example, the memory 16 may store oneor more software applications. Software applications stored in thememory 16 may include, for example, an operating system 43 for commoncomputer systems as well as for software-controlled devices. Further,the memory 16 may store an entire software application, or only a partof a software application, that are executable by the processor 14. Forexample, the memory device 16 may store one or more radiation therapytreatment plans 42.

The image processing device 12 can communicate with the network 20 viathe communication interface 18, which is communicatively coupled to theprocessor 14 and the memory 16. The Communication interface 18 mayprovide communication connections between the image processing device 12and radiotherapy system 10 components (e.g., permitting the exchange ofdata with external devices). For instance, the communication interface18 may in some embodiments have appropriate interfacing circuitry toconnect to the user interface 36, which may be a hardware keyboard, akeypad, or a touch screen through which a user may input informationinto radiotherapy system 10.

Communication interface 18 may include, for example, a network adaptor,a cable connector, a serial connector, a USB connector, a parallelconnector, a high-speed data transmission adaptor (e.g., such as fiber,USB 3.0, thunderbolt, and the like), a wireless network adaptor (e.g.,such as a WiFi adaptor), a telecommunication adaptor (e.g., 3G, 4G/LTEand the like), and the like. Communication interface 18 may include oneor more digital and/or analog communication devices that permit imageprocessing device 12 to communicate with other machines and devices,such as remotely located components, via the network 20.

The network 20 may provide the functionality of a local area network(LAN), a wireless network, a cloud computing environment (e.g., softwareas a service, platform as a service, infrastructure as a service, etc.),a client-server, a wide area network (WAN), and the like. For example,network 20 may be a LAN or a WAN that may include other systems S1 (38),S2 (40), and S3 (41). Systems S1, S2, and S3 may be identical to imageprocessing device 12 or may be different systems. In some embodiments,one or more of systems in network 20 may form a distributedcomputing/simulation environment that collaboratively performs theembodiments described herein. In some embodiments, one or more systemsS1, S2, and S3 may include a CT scanner that obtain CT images (e.g.,medical images 46). In addition, network 20 may be connected to internet22 to communicate with servers and clients that reside remotely on theinternet.

Therefore, network 20 can allow data transmission between the imageprocessing device 12 and a number of various other systems and devices,such as the OIS 28, the radiation therapy device 30, and the imageacquisition device 32. Further, data generated by the OIS 28 and/or theimage acquisition device 32 may be stored in the memory 16, the database24, and/or the hospital database 26. The data may betransmitted/received via network 20, through communication interface 18in order to be accessed by the processor 14, as required.

The image processing device 12 may communicate with database 24 throughnetwork 20 to send/receive a plurality of various types of data storedon database 24. For example, database 24 may include machine data thatis information associated with a radiation therapy device 30, imageacquisition device 32, or other machines relevant to radiotherapy.Machine data information may include radiation beam size, arc placement,beam on and off time duration, control points, segments, multi-leafcollimator (MLC) configuration, gantry speed, MRI pulse sequence, andthe like. Database 24 may be a storage device. One skilled in the artwould appreciate that database 24 may include a plurality of deviceslocated either in a central or a distributed manner.

In some embodiments, database 24 may include a processor-readablestorage medium (not shown). While the processor-readable storage mediumin an embodiment may be a single medium, the term “processor-readablestorage medium” should be taken to include a single medium or multiplemedia (e.g., a centralized or distributed database, and/or associatedcaches and servers) that store the one or more sets of computerexecutable instructions or data. The term “processor-readable storagemedium” shall also be taken to include any medium that is capable ofstoring or encoding a set of instructions for execution by a processorand that cause the processor to perform any one or more of themethodologies of the present disclosure. The term “processor readablestorage medium” shall accordingly be taken to include, but not belimited to, solid-state memories, optical and magnetic media. Forexample, the processor readable storage medium can be one or morevolatile, non-transitory, or non-volatile tangible computer-readablemedia.

Image processor 14 may communicate with database 24 to read images intomemory 16 or store images from memory 16 to database 24. For example,the database 24 may be configured to store a plurality of images (e.g.,3D MRI, 4D MRI, 2D MRI slice images, CT images, 2D Fluoroscopy images,X-ray images, raw data from MR scans or CT scans, Digital Imaging andCommunications in Medicine (DIMCOM) data, etc.) that the database 24received from image acquisition device 32. Database 24 may store data tobe used by the image processor 14 when executing software program 44, orwhen creating radiation therapy treatment plans 42. The image processingdevice 12 may receive the imaging data 46 (e.g., 2D MRI slice images, CTimages, 2D Fluoroscopy images, X-ray images, 3D MRI images, 4D MRIimages, etc.) either from the database 24, the radiation therapy device30 (e.g., a MRI-Linac), and or the image acquisition device 32 togenerate a treatment plan 42.

In an embodiment, the radiotherapy system 100 can include an imageacquisition device 32 that can acquire medical images (e.g., MagneticResonance Imaging (MRI) images, 3D MRI, 2D streaming MRI, 4D volumetricMRI, Computed Tomography (CT) images, Cone-Beam CT, Positron EmissionTomography (PET) images, functional MRI images (e.g., fMRI, DCE-MRI anddiffusion MRI), X-ray images, fluoroscopic image, ultrasound images,radiotherapy portal images, single-photo emission computed tomography(SPECT) images, and the like) of the patient. Image acquisition device32 may, for example, be an MRI imaging device, a CT imaging device, aPET imaging device, an ultrasound device, a fluoroscopic device, a SPECTimaging device, or any other suitable medical imaging device forobtaining one or more medical images of the patient. Images acquired bythe imaging acquisition device 32 can be stored within database 24 aseither imaging data and/or test data. By way of example, the imagesacquired by the imaging acquisition device 32 can be also stored by theimage processing device 12, as medical image data 46 in memory 16.

In an embodiment, for example, the image acquisition device 32 may beintegrated with the radiation therapy device 30 as a single apparatus(e.g., a MRI device combined with a linear accelerator, also referred toas an “MRI-Linac.” Such an MRI-Linac can be used, for example, todetermine a location of a target organ or a target tumor in the patient,so as to direct radiation therapy accurately according to the radiationtherapy treatment plan 42 to a predetermined target.

The image acquisition device 32 can be configured to acquire one or moreimages of the patient's anatomy for a region of interest (e.g., a targetorgan, a target tumor or both). Each image, typically a 2D image orslice, can include one or more parameters (e.g., a 2D slice thickness,an orientation, and a location, etc.). In an example, the imageacquisition device 32 can acquire a 2D slice in any orientation. Forexample, an orientation of the 2D slice can include a sagittalorientation, a coronal orientation, or an axial orientation. Theprocessor 14 can adjust one or more parameters, such as the thicknessand/or orientation of the 2D slice, to include the target organ and/ortarget tumor. In an example, 2D slices can be determined frominformation such as a 3D MRI volume. Such 2D slices can be acquired bythe image acquisition device 32 in “real-time” while a patient isundergoing radiation therapy treatment, for example, when using theradiation therapy device 30. “Real-time” meaning acquiring the data inat least milliseconds or less.

The image processing device 12 may generate and store radiation therapytreatment plans 42 for one or more patients. The radiation therapytreatment plans 42 may provide information about a particular radiationdose to be applied to each patient. The radiation therapy treatmentplans 42 may also include other radiotherapy information, such as beamangles, dose-histogram-volume information, the number of radiation beamsto be used during therapy, the dose per beam, and the like.

The image processor 14 may generate the radiation therapy treatment plan42 by using software programs 44 such as treatment planning software,such as Monaco®, manufactured by Elekta AB of Stockholm, Sweden. Inorder to generate the radiation therapy treatment plans 42, the imageprocessor 14 may communicate with the image acquisition device 32 (e.g.,a CT device, a MRI device, a PET device, an X-ray device, an ultrasounddevice, etc.) to access images of the patient and to delineate a target,such as a tumor. In some embodiments, the delineation of one or moreorgans at risk (OARs), such as healthy tissue surrounding the tumor orin close proximity to the tumor may be required. Therefore, segmentationof the OAR may be performed when the OAR is close to the target tumor.In addition, if the target tumor is close to the OAR (e.g., prostate innear proximity to the bladder and rectum), then by segmenting the OARfrom the tumor, the treatment planning device 110 may study the dosedistribution not only in the target, but also in the OAR.

In order to delineate a target organ or a target tumor from the OAR,medical images, such as MRI images, CT images, PET images, fMRI images,X-ray images, ultrasound images, radiotherapy portal images, SPECTimages and the like, of the patient undergoing radiotherapy may beobtained non-invasively by the image acquisition device 32 to reveal theinternal structure of a body part. Based on the information from themedical images, a 3D structure of the relevant anatomical portion may beobtained. In addition, during a treatment planning process, manyparameters may be taken into consideration to achieve a balance betweenefficient treatment of the target tumor (e.g., such that the targettumor receives enough radiation dose for an effective therapy) and lowirradiation of the OAR(s) (e.g., the OAR(s) receives as low a radiationdose as possible). Other parameters that may be considered include thelocation of the target organ and the target tumor, the location of theOAR, and the movement of the target in relation to the OAR. For example,the 3D structure may be obtained by contouring the target or contouringthe OAR within each 2D layer or slice of an MRI or CT image andcombining the contour of each 2D layer or slice. The contour may begenerated manually (e.g., by a physician, dosimetrist, or health careworker) or automatically (e.g., using a program such as the Atlas-basedauto-segmentation software, ABAS™, manufactured by Elekta AB ofStockholm, Sweden). In certain embodiments, the 3D structure of a targettumor or an OAR may be generated automatically by the treatment planningsoftware.

After the target tumor and the OAR(s) have been located and delineated,a dosimetrist, physician or healthcare worker may determine a dose ofradiation to be applied to the target tumor, as well as any maximumamounts of dose that may be received by the OAR proximate to the tumor(e.g., left and right parotid, optic nerves, eyes, lens, inner ears,spinal cord, brain stem, and the like). After the radiation dose isdetermined for each anatomical structure (e.g., target tumor, OAR), aprocess known as inverse planning may be performed to determine one ormore treatment plan parameters that would achieve the desired radiationdose distribution. Examples of treatment plan parameters include volumedelineation parameters (e.g., which define target volumes, contoursensitive structures, etc.), margins around the target tumor and OARs,beam angle selection, collimator settings, and beam-on times. During theinverse-planning process, the physician may define dose constraintparameters that set bounds on how much radiation an OAR may receive(e.g., defining full dose to the tumor target and zero dose to any OAR;defining 95% of dose to the target tumor; defining that the spinal cord,brain stem, and optic structures receive ≤45 Gy, ≤55 Gy and <54 Gy,respectively). The result of inverse planning may constitute a radiationtherapy treatment plan 42 that may be stored in memory 16 or database24. Some of these treatment parameters may be correlated. For example,tuning one parameter (e.g., weights for different objectives, such asincreasing the dose to the target tumor) in an attempt to change thetreatment plan may affect at least one other parameter, which in turnmay result in the development of a different treatment plan. Thus, theimage processing device 12 can generate a tailored radiation therapytreatment plan 42 having these parameters in order for the radiationtherapy device 30 to provide radiotherapy treatment to the patient.

In addition, the radiotherapy system 10 may include a display device 34and a user interface 36. The display device 34 may include one or moredisplay screens that display medical images, interface information,treatment planning parameters (e.g., contours, dosages, beam angles,etc.) treatment plans, a target, localizing a target and/or tracking atarget, or any related information to the user. The user interface 36may be a keyboard, a keypad, a touch screen or any type of device that auser may input information to radiotherapy system 10. Alternatively, thedisplay device 34 and the user interface 36 may be integrated into adevice such as a tablet computer, e.g., Apple iPad®, Lenovo Thinkpad®,Samsung Galaxy®, etc.

Furthermore, any and all components of the radiotherapy system 10 may beimplemented as a virtual machine (e.g., VMWare, Hyper-V, and the like).For instance, a virtual machine can be software that functions ashardware. Therefore, a virtual machine can include at least one or morevirtual processors, one or more virtual memories, and one or morevirtual communication interfaces that together function as hardware. Forexample, the image processing device 12, the OIS 28, the imageacquisition device 32 could be implemented as a virtual machine. Giventhe processing power, memory, and computational capability available,the entire radiotherapy system 10 could be implemented as a virtualmachine.

FIG. 1B illustrates an exemplary radiation therapy device 102 mayinclude a radiation source, such as an X-ray source or a linearaccelerator, a multi-leaf collimator (not shown), a couch 116, animaging detector 114, and a radiation therapy output 104. The radiationtherapy device 102 may be configured to emit a radiation beam 108 toprovide therapy to a patient. The radiation therapy output 104 caninclude one or more attenuators or collimators, such as a multi-leafcollimator (MLC) as described in the illustrative example of FIG. 2,below.

Referring back to FIG. 1B, a patient can be positioned in a region 112,using a table or couch 116 to receive a radiation therapy dose accordingto a radiation therapy treatment plan. The radiation therapy output 104can be mounted or attached to a gantry 106 or other mechanical support.One or more chassis motors (not shown) may rotate the gantry 106 and theradiation therapy output 104 around couch 116 when couch 116 is insertedinto the treatment area. In an embodiment, gantry 106 may becontinuously rotatable around couch 116 when the couch 116 is insertedinto the treatment area. In another embodiment, gantry 106 may rotate toa predetermined position when the couch 116 is inserted into thetreatment area. For example, the gantry 106 can be configured to rotatethe therapy output 104 around an axis (“A”). Both the couch 116 and theradiation therapy output 104 can be independently moveable to otherpositions around the patient, such as moveable in transverse direction(“T”), moveable in a lateral direction (“L”), or as rotation about oneor more other axes, such as rotation about a transverse axis (indicatedas “R”). A controller communicatively connected to one or more actuators(not shown) may control the couch 116 movements or rotations in order toproperly position the patient in or out of the radiation beam 108position according to a radiation therapy treatment plan. As both thecouch 116 and the gantry 106 are independently moveable from one anotherin multiple degrees of freedom, which allows the patient to bepositioned such that the radiation beam 108 precisely can target thetumor.

The coordinate system (including axes A, T, and L) shown in FIG. 1B canhave an origin located at an isocenter 110. The isocenter can be definedas a location where the radiation therapy beam 108 intersects the originof a coordinate axis, such as to deliver a prescribed radiation dose toa location on or within a patient. For example, the isocenter 110 can bedefined as a location where the radiation therapy beam 108 intersectsthe patient for various rotational positions of the radiation therapyoutput 104 as positioned by the gantry 106 around the axis A.

Gantry 106 may also have an attached imaging detector 114. The imagingdetector 114 preferably located opposite to the radiation source 104,and in an example, the imaging detector 114 can be located within afield of the therapy beam 108.

The imaging detector 114 can be mounted on the gantry 106 preferablyopposite the radiation therapy output 104, such as to maintain alignmentwith the therapy beam 108. The imaging detector 114 rotating about therotational axis as the gantry 106 rotates. In an embodiment, the imagingdetector 114 can be a flat panel detector (e.g., a direct detector or ascintillator detector). In this manner, the imaging detector 114 can beused to monitor the therapy beam 108 or the imaging detector 114 can beused for imaging the patient's anatomy, such as portal imaging. Thecontrol circuitry of radiotherapy device 102 may be integrated withinsystem 100 or remote from it, and is functionally represented by imageprocessing device 12 as shown in FIG. 1A.

In an illustrative example, one or more of the couch 116, the therapyoutput 104, or the gantry 106 can be automatically positioned, and thetherapy output 104 can establish the therapy beam 108 according to aspecified dose for a particular therapy delivery instance. A sequence oftherapy deliveries can be specified according to a radiation therapytreatment plan, such as using one or more different orientations orlocations of the gantry 106, couch 116, or therapy output 104. Thetherapy deliveries can occur sequentially, but can intersect in adesired therapy locus on or within the patient, such as at the isocenter110. A prescribed cumulative dose of radiation therapy can thereby bedelivered to the therapy locus while damage to tissue nearby the therapylocus is reduced or avoided.

FIG. 1C illustrates an exemplary radiation therapy device 102 that mayinclude combining a linear accelerator and an imaging system, such as acomputed tomography (CT) imaging system. The CT imaging system caninclude an imaging X-ray source 118, such as providing X-ray energy in akiloelectron-Volt (keV) energy range. The imaging X-ray source 118provides a fan-shaped and/or a conical beam 120 directed to an imagingdetector 122, such as a flat panel detector. The radiation therapysystem 102 can be similar to the system 102 described in relation toFIG. 1B, such as including a radiation therapy output 104, a gantry 106,a platform 116, and another flat panel detector 114. The X-ray source118 can provide a comparatively-lower-energy X-ray diagnostic beam, forimaging.

In the illustrative example of FIG. 1C, the radiation therapy output 104and the X-ray source 118 can be mounted on the same rotating gantry 106,rotationally-separated from each other by 90 degrees. In anotherexample, two or more X-ray sources can be mounted along thecircumference of the gantry 106, such as each having its own detectorarrangement to provide multiple angles of diagnostic imagingconcurrently. Similarly, multiple radiation therapy outputs 104 can beprovided.

FIG. 1D depicts an exemplary radiation therapy system 300 that caninclude combining a radiation therapy device and an imaging system, suchas a nuclear magnetic resonance (MR) imaging system (e.g., known in theart as an MR-Linac) consistent with the disclosed embodiments. As shown,system 300 may include a couch 210, an image acquisition device 140, anda radiation delivery device 130. System 300 delivers radiation therapyto a patient in accordance with a radiotherapy treatment plan. In someembodiments, image acquisition device 140 may correspond to imageacquisition device 32 in FIG. 1A that may acquire origin images of afirst modality (e.g., MRI image shown in FIG. 4A) or destination imagesof a second modality (e.g., CT image shown in FIG. 4B).

Couch 210 may support a patient (not shown) during a treatment session.In some implementations, couch 210 may move along a horizontal,translation axis (labelled “I”), such that couch 210 can move thepatient resting on couch 210 into and/or out of system 300. Couch 210may also rotate around a central vertical axis of rotation, transverseto the translation axis. To allow such movement or rotation, couch 210may have motors (not shown) enabling the couch to move in variousdirections and to rotate along various axes. A controller (not shown)may control these movements or rotations in order to properly positionthe patient according to a treatment plan.

In some embodiments, image acquisition device 140 may include an MRImachine used to acquire 2D or 3D MRI images of the patient before,during, and/or after a treatment session. Image acquisition device 140may include a magnet 146 for generating a primary magnetic field formagnetic resonance imaging. The magnetic field lines generated byoperation of magnet 146 may run substantially parallel to the centraltranslation axis I. Magnet 146 may include one or more coils with anaxis that runs parallel to the translation axis I. In some embodiments,the one or more coils in magnet 146 may be spaced such that a centralwindow 147 of magnet 146 is free of coils. In other embodiments, thecoils in magnet 146 may be thin enough or of a reduced density such thatthey are substantially transparent to radiation of the wavelengthgenerated by radiotherapy device 130. Image acquisition device 140 mayalso include one or more shielding coils, which may generate a magneticfield outside magnet 146 of approximately equal magnitude and oppositepolarity in order to cancel or reduce any magnetic field outside ofmagnet 146. As described below, radiation source 134 of radiotherapydevice 130 may be positioned in the region where the magnetic field iscancelled, at least to a first order, or reduced.

Image acquisition device 140 may also include two gradient coils 148 and149, which may generate a gradient magnetic field that is superposed onthe primary magnetic field. Gradient coils 148 and 149 may generate agradient in the resultant magnetic field that allows spatial encoding ofthe protons so that their position can be determined. Gradient coils 148and 149 may be positioned around a common central axis with the magnet146, and may be displaced along that central axis. The displacement maycreate a gap, or window, between coils 148 and 149. In the embodimentswherein magnet 146 also includes a central window 147 between coils, thetwo windows may be aligned with each other.

In some embodiments, image acquisition device 140 may be an imagingdevice other than an MRI, such as an X-ray, a CT, a CBCT, a spiral CT, aPET, a SPECT, an optical tomography, a fluorescence imaging, ultrasoundimaging, or radiotherapy portal imaging device, etc. As would berecognized by one of ordinary skill in the art, the above description ofimage acquisition device 140 concerns certain embodiments and is notintended to be limiting.

Radiotherapy device 130 may include the source of radiation 134, such asan X-ray source or a linear accelerator, and a multi-leaf collimator(MLC) 132 (shown below in FIG. 2). Radiotherapy device 130 may bemounted on a chassis 138. One or more chassis motors (not shown) mayrotate chassis 138 around couch 210 when couch 210 is inserted into thetreatment area. In an embodiment, chassis 138 may be continuouslyrotatable around couch 210, when couch 210 is inserted into thetreatment area. Chassis 138 may also have an attached radiation detector(not shown), preferably located opposite to radiation source 134 andwith the rotational axis of chassis 138 positioned between radiationsource 134 and the detector. Further, radiation therapy system 300 mayinclude control circuitry (not shown) used to control, for example, oneor more of couch 210, image acquisition device 140, and radiotherapydevice 130. The control circuitry of radiotherapy device 130 may beintegrated within the radiation therapy system 300 or remote from it.

During a radiotherapy treatment session, a patient may be positioned oncouch 210. System 300 may then move couch 210 into the treatment areadefined by magnetic coils 146, 148, 149, and chassis 138. Controlcircuitry may then control radiation source 134, MLC 132, and thechassis motor(s) to deliver radiation to the patient through the windowbetween coils 148 and 149 according to a radiotherapy treatment plan.

FIG. 1B, FIG. 1C, and FIG. 1D generally illustrate examples of radiationtherapy devices configured to provide radiotherapy treatment to apatient, including configurations where a radiation therapy output canbe rotated around a central axis (e.g., an axis “A”). Other radiationtherapy output configurations can be used. For example, a radiationtherapy output can be mounted to a robotic arm or manipulator havingmultiple degrees of freedom. In yet another example, the therapy outputcan be fixed, such as located in a region laterally separated from thepatient, and a platform supporting the patient can be used to align aradiation therapy isocenter with a specified target locus within thepatient.

As discussed above, radiation therapy devices described by FIG. 1B, FIG.1C, and FIG. 1D include a multi-leaf collimator for shaping, directing,or modulating an intensity of a radiation therapy beam to the specifiedtarget locus within the patient. FIG. 2 illustrates an exemplarymulti-leaf collimator (MLC) 132 that includes leaves 132A through 132Jthat can be automatically positioned to define an aperture approximatinga tumor 140 cross section or projection. The leaves 132A through 132Jpermit modulation of the radiation therapy beam. The leaves 132A through132J can be made of a material specified to attenuate or block theradiation beam in regions other than the aperture, in accordance withthe radiation treatment plan. For example, the leaves 132A through 132Jcan include metallic plates, such as comprising tungsten, with a longaxis of the plates oriented parallel to a beam direction, and havingends oriented orthogonally to the beam direction (as shown in the planeof the illustration of FIG. 1B). A “state” of the MLC 132 can beadjusted adaptively during a course of radiation therapy treatment, suchas to establish a therapy beam that better approximates a shape orlocation of the tumor 140 or other target locus. This is in comparisonto using a static collimator configuration or as compared to using anMLC 132 configuration determined exclusively using an “offline” therapyplanning technique. A radiation therapy technique using the MLC 132 toproduce a specified radiation dose distribution to a tumor or tospecific areas within a tumor can be referred to as Intensity ModulatedRadiation Therapy (IMRT).

FIG. 3 illustrates an example of another type of radiotherapy device 130(e.g., a Leksell Gamma Knife), according to some embodiments of thepresent disclosure. As shown in FIG. 3, in a radiotherapy treatmentsession, a patient 202 may wear a coordinate frame 220 to keep stablethe patient's body part (e.g., the head) undergoing surgery orradiotherapy. Coordinate frame 220 and a patient positioning system 222may establish a spatial coordinate system, which may be used whileimaging a patient or during radiation surgery. Radiotherapy device 130may include a protective housing 214 to enclose a plurality of radiationsources 212. Radiation sources 212 may generate a plurality of radiationbeams (e.g., beamlets) through beam channels 216. The plurality ofradiation beams may be configured to focus on an isocenter 218 fromdifferent directions. While each individual radiation beam may have arelatively low intensity, isocenter 218 may receive a relatively highlevel of radiation when multiple doses from different radiation beamsaccumulate at isocenter 218. In certain embodiments, isocenter 218 maycorrespond to a target under surgery or treatment, such as a tumor.

FIGS. 4A and 4B depict the differences between exemplary images acquiredusing different imaging modalities, based on which image acquisitiondevice 32 is utilized. For example, different image modalitiescharacterize patient tissue differently. As a result, images acquired ofthe same underlying object using different image modalities often maynot resemble each other in image characteristics. For example, FIG. 4Ashows a two-dimensional (2D) image of a patient's head acquired using anMRI device to acquire the image; and FIG. 4B, shows a corresponding 2Dimage of the same object acquired using a CT device. As shown in FIGS.4A and 4B, the two images represent the same object, but they presentnoticeably different image characteristics. For example, in FIG. 4A, theskull and vitreous body have roughly identical intensity values as shownby the MRI image. However, in contrast, FIG. 4B depicts very differentintensity values for the skull and vitreous body as indicated by thebrightness (e.g., white outline) of the skull as shown by the CT image.Therefore, the MRI image and the CT image intensity values can greatlydiffer.

The disclosed systems and methods are directed to generating a syntheticimage (e.g., pseudo-CT image) from an origin image acquired using afirst modality (e.g., an MRI image as exemplary shown in FIG. 4A) toresemble a destination image (e.g., a CT image as exemplary shown inFIG. 4B) acquired using a second modality.

Referring to FIG. 4C, which compares MR and CT images of a patient'shead from three different planes and provides an error map. For example,the top row of FIG. 4C depicts three origin images acquired using MRIfrom three different anatomical planes. The second row depicts syntheticCT images, generated based on the MRI images shown in the top row, usingthe disclosed image conversion systems and methods. The third rowdepicts three-planar images of the same object acquired using CT. Thefourth row shows error maps representing the differences between thesynthetic CT images (second row) and the acquired CT images (third row).As shown in the fourth row, the synthetic CT images provide pixel valuesclosely resembling the CT values in the acquired CT images for mostparts of the head region. In the example of FIG. 4C, the error mapsindicate that errors (e.g., differences in intensity betweencorresponding voxels in a synthetic CT image and a corresponding real CTimage) are minimal (e.g., within a noise level of a typical CT image)and restricted to interfaces between different tissue types, especiallyaround the borders of bones and air.

FIG. 5 depicts an exemplary image conversion system 400, consistent withdisclosed embodiments. In some embodiments, image conversion system 400may include an image acquisition device 140, an image database 420, animage conversion unit 440, a training database 410 storing trainingimages 412, a training unit 430, and a network 460. The imagingconversion system 400 can perform two functions: (a) provide a trainingunit 430 to train a predictive model using the training images 412; (b)provide an image conversion unit 440 configured to use the trainedpredictive model to generate synthetic images. In an example, the imageacquisition device 140 can correspond to the image acquisition device 32as shown in FIG. 1A.

The disclosed systems and methods use predictive models to generatesynthetic images. Consistent with the above disclosure, the term“synthetic image,” as used herein, generally refers to anycomputer-generated image data representing or resembling an imageacquired in one modality (e.g., a CT image), while based on actual imagedata acquired from an imaging device using a different modality (e.g.,an MRI image). As explained below, such synthetic images are convertedfrom origin images acquired using a first modality (e.g., MRI) toresemble images generated using a second modality (e.g., CT). Thepredictive model may advantageously enable automatic conversion ofimages without the need for tissue classification, registration to anatlas, or model-based methods that require manually crafted features.

In some embodiments, the disclosed systems, methods, devices, andprocesses concern generation of synthetic CT images from images acquiredusing another modality, such as MRI images. In some aspects, thepredictive models may be neural networks, such as types of convolutionalneural networks (CNN) or deep neural networks, and other types ofartificial intelligence known in the art. Synthetic image generation maythus include two stages. In a first stage, the predictive model may begenerated (also known as “trained” or “learned”) using training data,including training MRI images and their corresponding real CT images.The training process may automatically use the training data to learnparameters or weights of the predictive model. These parameters orweights may be learned iteratively, according to an optimalitycriterion. In a second stage, synthetic CT images (e.g., also referredto as pseudo-CT images) may be generated from the MRI images using thepredictive model.

An “origin image” may refer to an image acquired using a first imagingmodality, such as nuclear magnetic resonance imaging (MRI). A“destination image” may refer to an image acquired using a secondmodality differing from the first modality, such as computed tomography(CT).

As described above, a “synthetic image” may refer to an image generatedfrom one or more “origin images,” but resembling a “destination image.”In some embodiments, the synthetic image may be referred to as apseudo-CT image. That is, the synthetic image may have thecharacteristics of the destination image. For example, a synthetic CTimage (“synthetic image”) generated from one or more MRI images (“originimages”) may have the characteristics of a real CT image that isacquired by a CT scanner (“destination image”). Throughout thisdisclosure, examples will be discussed as the origin image being a MRIimage and the destination image being a CT image. But this is not alimitation. Alternatively, for example, the origin image could be a CTimage and the destination image be a MRI image. In an example where asynthetic CT image can be generated from one or more MRI images, thesynthetic CT image can be used in a radiotherapy planning process,without exposing a patient to ionizing radiation as in a typical CTimaging process.

As used herein, an image “resembles” another image when a differencebetween the two images is smaller than predetermined threshold criteria.In an example, a synthetic CT image can “resemble” a corresponding CTimage if voxel by voxel differences in intensity between the two imagesare less than 20 Hounsfield units, and in some examples, less than 100Hounsfield units in average. For example, the difference may bedetermined as an error map indicating the differences in pixel valuesbetween the two images. The difference may be measured by a lossfunction, such as a mean absolute error or a mean squared errorcalculated based on the two images. However, synthetic images (e.g.,pseudo-CT images) consistent with the present disclosure may be shown toresemble a destination image using other types of image thresholdcriteria. Further, systems and methods consistent with the presentdisclosure do not necessarily need to apply any such predeterminedthreshold criteria in order for a destination image to resemble anorigin image. In other words, and as described below, systems andmethods consistent with this disclosure may use any such predeterminedthreshold criteria to potentially validate a degree of resemblancebetween the origin and destination images. The systems and methodsconsistent with the present disclosure may generate synthetic images(e.g., including pseudo-CT images) that meets the predeterminedthreshold criteria.

In order to for the training unit 430 to train a predictive model,training images may be utilized. Training database 410 may be configuredto store one or more sets of training images 412, consistent with thedisclosed embodiments. Each set of training images may include one ormore origin images (“origin images”) acquired using the first imagingmodality and corresponding destination images (“destination images”)acquired in the second imaging modality. The one or more origin imagesand the corresponding destination images are acquired of the sameobject, and may include corresponding views of this object (e.g., imagesof the same anatomical portion taken at substantially the sameorientation, but using different types of image modalities). Asexplained below, the origin images and destination images may be either2D or 3D images. In some embodiments, database 410 may be part of anoncology information system that manages oncology treatment plans forpatients. In some aspects, database 410 may receive these image setsfrom an image database having images previously acquired by imageacquisition device 140 during one or more radiotherapy treatmentsessions.

Image database 420 may store images acquired using the first imagingmodality. In some embodiments, image acquisition device 140 may acquirethe images stored in image database 420. The images stored in imagedatabase 420 may also correspond to images acquired during one or moreradiotherapy treatment sessions. As explained below, the images may be2D and/or 3D images.

In some aspects, image conversion system 400 may be configured toperform a training stage. In some embodiments, training unit 430 may beconfigured to use training data from training database 410 to train apredictive model. Training unit 430 may be implemented in hardwareand/or software, as would be recognized by one of skill in the art.During the training stage, training unit 430 may be configured togenerate estimated synthetic images based on training origin imagesreceived from training database 410. Training unit 430 may also generateerror maps by comparing the estimated synthetic images to trainingdestination images (e.g., CT images) and may then adjust the parametersof the predictive model based on the generated error maps that identifydifferences between the synthetic images and the real CT images.Training unit 430 may be configured to continue training the predictivemodel until certain stopping criteria are satisfied (check error betweenmodel prediction and real CT image, can also be trained based on numberof iterations of training, or if model error is less than a thresholdvalue). In an example, the stopping criteria can be satisfied when anumber of training iterations exceeds a threshold value (e.g., whenthere have been more than 100 training epochs—in neural networktraining, one epoch can be equal to the number of iterations to gothrough all training data one time) In an example, the stopping criteriacan be satisfied if a voxel by voxel difference in intensity between thetwo images is less than a threshold value (e.g., 20 Hounsfield unit).Training unit 430 may then save the trained predictive model, which maylater be used by image conversion unit 440.

In another aspect, the image conversion system 400 may also beconfigured to perform a conversion stage. During the conversion stage,image conversion unit 440 may be configured to receive a trainedpredictive model from training unit 430. In some embodiments, imageconversion unit 440 may be configured to generate synthetic images fromorigin images 422 received from image database 420. Image conversionunit 440 may be implemented in hardware and/or software, as would berecognized by one of skill in the art. Image conversion unit 440 may beconfigured to generate the synthetic images using the trained predictivemodel and the received origin images 422.

In some embodiments, image conversion system 400 may be configured todisplay the synthetic image to be accessed by a user interface e.g., agraphical user interface as provided with a computer which may alsoinclude a tablet, an iPad, a mobile device and the like), store thesynthetic image in image database 420 for further use in treatmentplanning, and provide the synthetic image to Treatment Planning Software115. As an additional example, image conversion system 400 may beconfigured to store the origin image and the synthetic image in trainingdatabase 410. Such stored images may become part of training images 412.In various aspects, storage of the origin image and the synthetic imagemay be automatic, semi-automatic, or manual. Image conversion unit 440may be implemented in hardware and/or software, as would be recognizedby one of skill in the art.

Image acquisition device 140, as discussed with regard to FIG. 1, may beconfigured to acquire images using one or more imaging modalities (asdescribed above), including MRI, functional MRI (e.g., fMRI, DCE-MRI anddiffusion MRI), CT, CBCT, spiral CT, PET, SPECT, X-ray, opticaltomography, fluorescence imaging, ultrasound imaging, and radiotherapyportal imaging, etc. In some embodiments, image acquisition device 140may provide acquired images to training database 410, image database420, training unit 430, and image conversion unit 440.

Network 460 may be configured to provide communications between thecomponents of FIG. 4. For example, network 460 may be any type ofnetwork (including infrastructure) that provides communications,exchanges information, and/or facilitates the exchange of electronicinformation between one or more devices. In this regard, network 460 mayinclude a wired connection (e.g., a LAN or other hardwired connection),a wireless connection (e.g., WiFi, Bluetooth, ZigBee, NFC, WiMAX, LETand the like), a computer bus, a serial connection, a parallelconnection, an Ethernet connection, a local area network or a wide areanetwork, an internet connection, a satellite connection, or any othersuitable connection(s), including a connection to a cloud computingservice, or any combination thereof that enables the components of imageconversion system 400 to send and to receive information between eachother in any format and under any communications protocol.

It is contemplated that FIG. 4 illustrates only an exemplary arrangementof image conversion system 400. In some embodiments, additionalcomponents may be added, and/or the depicted components may be combined,divided, modified, or removed. For example, the training stage may beperformed in advance and performed separately by another system.Accordingly, image conversion system 400 may not include trainingdatabase 410 or training unit 430. Further, in some aspects, at leastone component of image conversion system 400 may be locatedgeographically remotely from the remaining components, and maycommunicate with the remaining components through the network 460. Forexample, the training stage including the training unit 430 and trainingdatabase 410 may be located in the research and development department;whereas the conversion stage including the image conversion unit andorigin images may be located in the radiotherapy clinic.

In still other embodiments, two or more components of image conversionsystem 400 may be implemented at a single location (e.g., a radiotherapytreatment room). Also, while FIG. 4 shows various components of imageconversion system 400 as separate, image conversion system 400 mayimplement some of these components in a single device. For example,database 410 and training unit 430 may be implemented within a singledevice, such as a tablet, a laptop, a desktop, a workstation, a server,or a purpose-built image conversion device. Similarly, training unit 430and image conversion unit 440 may be implemented within a single device.

FIG. 6 illustrates an exemplary image pre-processing process 600.Process 600 may be performed by training unit 430 as part of thetraining stage or prior to processing done by the training stage.Certain aspect of process 600, (e.g., receiving the origin image 601)may also be performed by image conversion unit 440 as part of or priorto the conversion stage. At 601, image conversion system 400 may receivean origin image 422 of a patient that was acquired using a first imagingmodality. Consistent with the disclosed embodiments, the origin image422 may be received from an image acquisition device 140, oralternatively, the image conversion system 400 may receive a trainingimage 412 from training database 410. The image conversion system thenprovides the origin image to be normalized.

At 603, training unit 430 may normalize the origin image 422. Thisnormalization process may improve the accuracy of image conversionsystem 400 by standardizing the input image data provided to thepredictive model. Here, at 603, the training unit 430 may perform avariety of processing functions on the origin image 422 to generate anormalized origin image with a standardized alignment, resolution,and/or intensity value distribution. For example, training unit 430 mayalign the acquired origin image 422 with a standard CT image such as toadjust the orientation of the object in the image. Training unit 430 mayalso resample the origin image 422 so that the normalized origin imagehas a reference resolution and/or reference field of view. For example,the origin image may be up-sampled using interpolation methods ordown-sampled using averaging or max pooling methods. In an example, theorigin image can be up-sampled or down-sampled, such as to provide anorigin image having the same spatial resolution as a correspondingdestination image.

Further, the training unit 430 may correct the intensity valuedistribution of the origin image 422 to better approximate astandardized intensity value distribution. For example, when the originimage 422 is an MRI image (e.g., exemplary image 611), training unit 430may use the N3 bias field correction algorithm known in the art tocorrect intensity non-uniformities in the origin image. As known in theart, other correction algorithms may be applied, such as joint histogramregistration, landmarks or histogram matching, or dynamic histogramwarping. At the end of 603, a normalized origin image is created andprovided to 605.

At 605, training unit 230 may generate a binary mask (e.g., exemplaryimage 615) from the normalized origin image. The training unit 430 maybe configured to use the binary mask to remove undesirable portions(e.g., corresponding to a stereotactic head frame) of the normalizedorigin image 422. The training unit 430 may generate the binary mask byapplying thresholding and spatially filtering to the normalized originimage. The training unit 430 may automatically determine the thresholdsbased on a histogram of image intensity values of the origin image 422,according to methods known to one of skill in the art. For example,training unit 430 may determine a threshold value to apply or compare toeach intensity value of the normalized origin image. The training unit430 may compare each intensity value of the normalized origin image witha threshold value. The predetermined threshold value may be a defaultintensity value. Through this comparison, the training unit 430 mayproduce a binary mask image having logical “1” or “0” intensity values.The intensity values in the binary mask image depend on whether thecorresponding intensity values of the original origin image meet orexceed the threshold value. In some implementations, training unit 430may also spatially filter this resulting binary mask image using knownmorphological operations to generate the final binary mask.

The training unit 430 may then use the generated binary mask to removethe stereotactic head frame in an origin image. For example, portions ofthe origin image within the binary mask may keep their originalintensity values. Other portions of the destination image outside thebinary mask may then be set to a predetermined intensity value.

At 607, training unit 430 may receive a destination image 412 of thesame object acquired using another imaging modality (e.g., CT image617). The destination image 412 can be received from the trainingdatabase 410. This destination image 412 may correspond to the object(s)shown in the origin image 611. For example, the origin image 611 may bean MRI image of a particular portion of a patient's anatomy and thedestination image 412 may be a corresponding CT image of the sameanatomical portion. Training unit 430 may receive the destination image412 from image acquisition device 140 or training database 410. Forexample, when the destination image 412 is a CT image, the destinationimage 412 may include portions of a stereotactic head frame (e.g., imageelement 618 a and image element 618 b that correspond to the headframe). Typically, the stereotactic head frame is not shown in thecorresponding MRI image (e.g., the origin image 611).

At 609, training unit 430 may normalize the destination image 412. Thenormalization of the origin image 611 was described above, here, at 609,normalization of the destination image may improve the accuracy of imageconversion system 400. The normalization of the destination image isperformed by standardizing the input data provided to train thepredictive model, e.g., neural network 700, as shown in FIG. 7D,described below. The training unit 430 may perform a variety ofprocessing functions on the destination image to generate a normalizeddestination image with a standardized alignment, resolution, and/orintensity value distribution.

In some aspects, an image registration or image alignment process isperformed to align the destination image with the origin image for eachpair of training images. This is needed because the destination and theorigin images may be acquired by different imaging devices or atdifferent scanning times, and hence they may not be spatially aligned.

In some aspects, the destination image may also be segmented using thebinary mask. For example, the binary mask may correspond to imageportions associated with the patient. Other portions of the destinationimage outside the binary mask may then be set to a predeterminedintensity value. The predetermined value may be a default intensityvalue. For example, when the destination image is a CT image, theportions may be set to a Hounsfield scale value of −1000. In thismanner, by using the binary mask, the portion of the CT image showing,for example, the stereotactic head frame may be eliminated or reduced,in whole or in part (e.g., image 619) because the head frame portion wasoutside the binary mask.

It should be noted, however, that image conversion system 400 mayinclude process 600 as an optional step. In some embodiments, imageconversion system 400 not utilize every aspect of process 600. Forinstance, image conversion system 400 may train the predictive modeldirectly using acquired images. For example, if the images do notcontain image features such as the head frame, then process 600 may notbe utilized. The disclosed method may be able to train the predictivemodel to be adaptive to training images without preprocessing. Forexample, image conversion system 400 may still generate high-qualitysynthetic images using a predictive model trained on training origin andtraining destination images that were not pre-processed by usingfunctionality included in process 600.

With or without pre-processing, training unit 430 may use the trainingimages (including the training origin images and training destinationimages) to train the predictive model. Specifically, the predictivemodel may be a convolutional neural network (CNN), which is a known typeof supervised machine learning. A convolutional neural network mayinclude a stack of distinct layers that transform an input into anoutput. The layers may differ in input size, output size, and therelationship between the input and the output for the layer. Each layermay be connected to one or more upstream and downstream layers in thestack of layers. The performance of a convolutional neural network maythus depend on the number of layers, and the convolutional neuralnetwork's complexity may increase as the number of layers increases. Aconvolutional neural network may be viewed as “deep” if it has more thanone stages of non-linear feature transformation, which typically meansthe number of layers in the network is above a certain number. Forexample, some convolutional neural networks may include about 10-30layers, or in some cases more than a few hundred layers. Examples ofconvolutional neural network models include AlexNet, VGGNet, GoogLeNet,ResNet, etc. These convolutional neural network models were designed forimage recognition tasks, and can be used at the encoding part of thefull convolutional neural network model in the disclosed embodiments.

FIGS. 7A-7E illustrate the structure and various components of exemplaryconvolutional neural network models. Among others, a convolutionalneural network may typically include a plurality of convolutionallayers, de-convolutional layers, pooling (e.g., down-sampling) layers,and unpooling (e.g., up-sampling) layers. FIG. 7A illustrates theoperation of an exemplary convolutional layer of a convolutional neuralnetwork. Such a convolutional layer may be configured to perform two- orthree-dimensional convolutions of the input to the convolutional layerwith a set of learnable filter functions. A learnable filter functionmay be defined by a matrix of weights W, where each weight is to beapplied to an image pixel during a convolution operation, and an offsetvalue b. Weight matrix Wand offset b are among the model parameters thatneed to be learned during the training stage. The convolutional layermay be configured to further apply an activation function to the resultof these convolutions. Non-limiting examples of the activation functioninclude a rectified linear units function, a sigmoid function, ahyperbolic tangent function, or a similar non-linear function. Thecombination of the filter function and the activation function may beexpressed as:h=max(0,W*X+b)  Eq. 1

Eq. 1 consists of two parts: a learnable filter function W*X+b, and anactivation function max(0,⋅). X may be the input to the filter functionW*X+b. In some embodiments, X may be two dimensional (e.g., an image) orthree dimensional (e.g. a stack of images). The activation functionmax(0,⋅) may be applied to the output of this learnable filter functionto generate output h. This output, or feature map, may indicate theimportance of features in the input. The activation function max(0,⋅)may be a rectified linear units function that may act as an activationfilter, by zeroing out negative values of the convolution operation. Insome embodiments, zero-padding may be used to ensure that the spatialsize of h are equivalent to the spatial size of X As a non-limitingexample, when X is 1024×1024, zero-padding may be used to ensure thatthe spatial size of h is also 1024×1024. When a n×n filter function isused, n−1 rows (or columns) of zeros may be added to the margins ofmatrix X, before X is convoluted with the filter. As a non-limitingexample, the learnable filter functions may be 3×3 filters, in whichcase 2 rows and 2 columns of zeros may be added to X. For example, a rowand column of zeros may be added to each margin of X.

To illustrate, FIG. 7A shows the application of a learnable filterfunction and activation function to a two-dimensional input feature mapin a convolutional layer. Here exemplary 3×3 filter function (comprisingweight matrix 711 a and offset value 711 b) and activation function areapplied to an exemplary 4×4 input feature map 713 a to generate a 4×4output feature map 713 c. Input feature map 713 a may be first paddedwith 2 rows and 2 columns of zeros, as shown by zero padding 715. Here,one column (or row) of zeros may be added to each margin of inputfeature map 713 a. But as would be appreciated by one of skill in theart, the amount of zero padding may depend on the size of weight matrix711 a, and the particular amount and arrangement of zero padding shownis not intended to be limiting. After the zero padding, input featuremap 713 a becomes 6×6, and may have values x^(n,m) in its nth row, mthcolumn (n, m=1, 2, . . . 6). Weight matrix 711 a may contain a total of9 weights, e.g., w^(j,k) for the weight in the j^(th) row, k^(th) columnof the filter matrix. In the example shown, offset value 711 b may be asingle offset value. The predictive model may be configured to convolveweight matrix 711 a with the zero-padded input feature map 713 a, theresult of which is shown as intermediate map 713 b. To perform theconvolution, at each position, weight matrix 711 a will overlap with a3×3 portion of the zero-padded input feature map 713 a. For example, asshown in FIG. 7A, weight matrix 711 a may overlap with the 3×3 portioncentered at x^(3,5) (convolution input 719 a). The respective pixelvalues of input feature map 713 a may be multiplied with the respectiveweights overlapping with them, and the multiplication results are summedto derive value h^(2,4) in intermediate map 713 b (convolution output719 b). The convolutional layer may generate output feature map 713 c bysumming each element of intermediate map 713 b with offset value 711 band applying activation function max(0,⋅) to each resulting sum. Forexample, as shown in FIG. 7A, when the sum of offset value 711 b andvalue h^(2,4) is less than zero, activation function output 719 c mayadjust it to zero. Other values of output feature map 713 c may becomputed as the learnable filter slides to overlap with differentportions of input feature map 713 a.

In a similar fashion, the convolutional layer may apply a learnablefilter function and activation function to a three-dimensional inputfeature map in a convolutional layer. Here the weight matrix may bethree dimensional. As in the two-dimensional case, the predictive modelmay be configured to convolve the weight matrix with the zero-paddedinput feature map to generate an intermediate map. But in thethree-dimensional case the weight matrix may overlap with a volume ofthe zero-padded input feature map at each position. For example, whenthe input feature map includes a stack of three images, the weightmatrix may be 3×3×3, and may overlap with a 3×3×3 volume of thezero-padded input feature map during the convolution operation. As inthe two-dimensional case, the respective pixel values of the inputfeature map may be multiplied with the respective weights overlappingwith them, and the multiplication results may be summed to derive acorresponding value in a three-dimensional intermediate map. Thiscorresponding value may be summed with an offset value, and anactivation function may be applied to the result to generate athree-dimensional output feature map.

A convolutional neural network may include stacks of convolutionallayers comprising one or more convolutional layers. In some embodiments,these stacks may include between two and five convolutional layers.Furthermore, different stacks of convolutional layers may comprisediffering numbers of convolutional layers. FIG. 7B illustrates theoperation of convolutional layer(s) 720, an exemplary stack ofconvolutional layers. Convolutional layer(s) 720 may be configured toreceive feature map(s) 721 a and output feature map(s) 721 c.Convolutional layer(s) 720 may include convolutional layer 723 a, whichmay be configured to generate feature map(s) 721 b using feature map(s)721 a, and convolutional layer 723 b, which may be configured togenerate feature map(s) 721 c using feature map(s) 721 b. The number offeature maps in each of feature map(s) 721 a, feature map(s) 721 b, andfeature map(s) 721 c may be predetermined.

In some embodiments, feature map(s) 721 a may include one or more originimages of an object. For example, feature map(s) 721 a may includeT1-weighted MRI image(s), T2-weighted MRI image(s), and MRI image(s)generated using a contrast agent. As an additional example, featuremap(s) 721 a may comprise different slices in a stack of 2D images ofthe object. These slices may be adjacent. In various embodiments,feature map(s) 721 a may include a stack of one or more feature mapsgenerated by another component of the predictive model. For example,feature map(s) 721 a may comprise a stack of sixty four feature mapsgenerated by a previous convolutional layer of the predictive model.

Feature map(s) 721 b may include feature maps generated by convolutionallayer 723 a. In some embodiments, convolutional layer 723 a may generatethese feature maps according to Eq. 1. For example, convolutional layer723 a may apply a learnable filter function and activation function togenerate each feature map in feature map(s) 721 b. Each learnable filterfunction may be described by a weight matrix and an offset value. Valuesof the weight matrix and the offset are parameters of the learnablefilter, which can be learnt from data at the training stage of theconvolutional neural network model. The number of parameters in theweight matrix may depend on the spatial size of the learnable filterfunction and the number of feature maps in feature map(s) 721 a. Forexample, when input feature map(s) 721 a includes l feature maps and thefilter function is m×n in spatial size, the weight matrix may includem×n×l parameters. When output feature map(s) 721 b includes k featuremaps, k learnable filter functions may be used and accordingly, theoverall number of parameters in convolutional layer 723 a will bek×(m×n×l+1). As shown in FIG. 7B, feature maps in feature map(s) 721 band feature map(s) 721 a may be the same spatial size. In variousembodiments, the number of feature maps in feature map(s) 721 a andfeature map(s) 721 b may differ. For example, feature map(s) 721 a mayinclude three feature maps, while feature map(s) 721 b may include moreor less feature maps.

Feature map(s) 721 b may be provided as inputs to the next convolutionallayer 723 b, which generates feature map(s) 721 c. Similar toconvolutional layer 723 a, convolutional layer 723 b may generate thesefeature maps according to Eq. 1, including applying one or morelearnable filter functions and activation functions. Again, the numberof feature maps in feature map(s) 721 b and feature map(s) 721 c may bethe same, or they may differ. For example, feature map(s) 721 b mayinclude 64 feature maps, while feature map(s) 721 b may include 128feature maps.

FIG. 7C illustrates the operation of an exemplary max pooling andunpooling layers of the predictive model. Pooling layers may receive anddown-sample the input feature map received from a convolutional layer,generating an output feature map of reduced size. This output featuremap may have lower spatial resolution than the input feature map, andconsequent convolutional layers can then learn image features withgreater spatial extent, or greater spatial invariance, than the imagefeatures learned by the preceding convolutional layer at the higherspatial resolution. Thus, the predictive model may use pooling layers tohelp learn features at different levels of spatial resolution, improvingconversion accuracy. For example, a pooling layer may use a 2×2 windowwith a stride of two input feature values (i.e., a non-overlappingwindow) to down-sample the feature map by a factor of two in eachdimension. It is contemplated that the stride may differ and the windowused may be any other suitable size, such as 3×3, 4×4, 8×8, 16×16,32×32, etc.

In some embodiments, as shown in FIG. 7C, the pooling layer may be amax-pooling layer, which selects a single value equal to the maximumvalue within a pooling window. In the example of FIG. 7C, the poolingwindow is a 2×2 window. Other suitable sizes are contemplated for thepooling window. FIG. 7C depicts a portion of a pooling layer 731 a,showing the values within pooling window 733 a. The predictive model maybe configured to output the largest value within pooling window 733 a(i.e., 0.8) to a corresponding location in post-downsampling feature map735 a, as indicated by the arrow connecting pooling layer 731 a andpost-downsampling feature map 735 a. The predictive model may also beconfigured to store locations of the maximum values within pooling layer731 a. For example, the predictive model may be configured to store suchlocations in binary mask 737, as indicated by the arrow connectingpooling layer 731 a and binary mask 737. Binary mask 737 may be the samesize as pooling layer 731 a, and may indicate which input features hadthe maximum value within each window. A pixel value of “1” (or “True”)in binary mask 737 may indicate that a corresponding pixel of poolinglayer 731 a contains a maximum value within a window, and a pixel valueof “0” (or “False”) may likewise indicate that a corresponding pixel ofthe feature map does not contain that maximum value within a window. Asshown in FIG. 7C, the pixels of binary mask 737 corresponding to thepixel values 0.3, 0.6 and 0.1 in pooling window 733 a have a pixel value“0”, because those pixel values are not the maximum pixel value inpooling window 733 a, while the pixel of binary mask 737 correspondingto the pixel value 0.8 in pooling window 733 a has a pixel value “1”,because 0.8 was the maximum pixel value in pooling window 733 a. Thepredictive model may be configured to provide this informationconcerning maximum value locations to unpooling layers, as indicated bythe arrow connecting binary mask 737 and unpooling layer 731 b.

Unpooling layers may increase the size of the input feature map byupsampling. This upsampling may increase the spatial resolution of afeature map, increasing the ability of image conversion system 400 toaccurately depict high resolution details in the synthetic image. Insome embodiments, each unpooling layer may use an unpooling window sizeof 2×2, matching the 2×2 pooling window used in the correspondingpooling layer. Thus, an unpooling layer may cause the size of thefeature map to increase by a factor of two in each dimension. Othersuitable sizes are contemplated for the unpooling window.

In some embodiments, as shown in FIG. 7C, the predictive model may beconfigured to relate a pixel value in pre-upsampling feature map 735 bto a pixel value in unpooling layer 731 b. As shown in FIG. 7C, a pixelin pre-upsampling feature map 735 b may correspond to an unpoolingwindow 733 b in unpooling layer 731 b. A binary mask may further definea correspondence between pixels in pre-upsampling feature map 735 b andpixels in unpooling layer 731 b. This binary mask may have beengenerated by a corresponding pooling layer. For example, binary mask 737generated by corresponding pooling layer 731 a may be used by unpoolinglayer 731 b. As described above, binary mask 737 may indicate whichpixel values of pooling layer 731 a were stored in post-downsamplingfeature map 735 a. As shown in FIG. 7C, the pixel in pre-upsamplingfeature map 735 b with pixel value 0.7 may correspond to the top rightpixel in unpooling window 733 b, because the top right value of thecorresponding portion of binary mask 737 has a value of “1.” Thepredictive model may be configured to set this corresponding pixel inunpooling layer 731 b to the pixel value of the pixel in pre-upsamplingfeature map 735 b, as indicated by the arrow connecting pre-upsamplingfeature map 735 b and unpooling layer 731 b. The remaining elements ofunpooling window 733 b may be set to a predetermined value, such aszero.

FIGS. 7D and 7E depict neural network 700, an exemplary implementationof the predictive model used by image conversion system 400, consistentwith the disclosed embodiments. Neural network 700 may be configured toconvert digital images represented by matrices of values, each valuecorresponding to a pixel in the image. The size of the matrix maycorrespond to the spatial size of the image. For example, a 1024×1024matrix may correspond to an image of 1024×1024 spatial size. The matrixrepresenting the origin image and the generated matrix representing thesynthetic image may have the same size (or spatial resolution). Unlikeconventional methods that predict the synthetic image on apixel-by-pixel basis (one value of the matrix at a time), the disclosedmethod predicts an entire synthetic image of an equal spatial size asthe origin image (all values of the matrix at once).

As described herein, neural network 700 may comprise input 741 andoutput 751. In some embodiments, a spatial size (or spatial resolution)of input 741 and output 751 may be the same. For example, input 741 mayreceive at least one 1024×1024 pixel origin image and output 751 mayoutput at least one 1024×1024 pixel calculated synthetic image. Neuralnetwork 700 may, however, process image data of other spatial sizes.

Input 741 of neural network 700 may be configured to accept one or moreorigin images. When training neural network 700 to convert 3D images,the accuracy of the neural network 700 may be increased when input 741receives a stack of adjacent images, as the adjacent 2D images maycontain dependent structure information. Dependent structure informationmay include spatially dependent relationships between the anatomicalstructures shown in the stack of adjacent 2D images. These spatiallydependent relationships may be along the axis orthogonal to theanatomical plane of the 2D images. As a non-limiting example, the shapeand type of an anatomical structure represented by a first set of pixelsin a first image of the stack may also be represented by a second set ofpixels in a second image adjacent to the first image. This is becausethe first and second images are spatially neighboring each other alongthe axis orthogonal to the anatomical plane. As a result, the two imageswill have some dependency or continuity of the anatomical structuresshown in these images. Therefore, the shape, size, and/or type of ananatomical structure in one image may provide additional information ofthe shape, size, and/or type of the anatomical structure in anotheradjacent image along the same plane. The effect of dependent structureinformation may depend on various factors, such as the number ofadjacent images in the stack, the anatomical structures depicted in theimages, and/or the imaging modality used for obtaining the images.

In some embodiments, the input 741 may also be configured to receivemulti-channel MRI images. For example, one or more first channels ofinput 741 may be configured to receive a T1-weighted MRI images and oneor more second channels of input 741 may be configured to receiveT2-weighted MRI images. As known in the art, T1 is the longitudinalrelaxation constant and T2 is the transverse relaxation constantgoverning the rates at which excited protons in a tissues return toequilibrium. These rate constants differ from each other and differbetween tissues. Depending on MRI imaging parameters, the contrast andbrightness of a tissue in an MRI image may be predominantly determinedby the T1 rate constant (a T1-weighted MRI image) or the T2 rateconstant (a T2-weighted MRI image). Thus T1-weighted MRI images andT2-weighted MRI images convey different information about an imagedobject. Neural network 700 may be configured to use the additionalinformation provided by using a first channel of T1-weighted MRI imagesand another channel of T2-weighted MRI images to improve conversionaccuracy. Additionally or alternatively, channels may be devoted toimages associated with other tissue parameters known in the art, such asspin density, or to MRI images acquired using contrast agents.

As shown in FIGS. 7D and 7E, image data may flow from input 741 tooutput 751 through encoding stage 740 and decoding stage 750. Encodingstage 740 may be configured to extract feature maps from an input image.In some implementations, encoding stage 740 may not includefully-connected layers. A fully-connected layer is generally one inwhich every pixel of the input feature map contributes to each pixel ofthe output feature map. As recognized in the art, fully-connected layersrequire an extremely large number of parameters and generate featuressuited for image classification tasks. Such features provide littlebenefit in this application, and excluding fully-connected layers mayreduce the number of parameters required to train the convolutionalneural network. Decoding stage 750 may be configured to convert featuremaps output from encoding stage 740 into a synthetic image with the samespatial resolution as the original image.

As shown in FIGS. 7D and 7E, neural network 700 may comprise stacks ofconvolutional layers (e.g., convolutional layer(s) 743 a, andconvolutional layer(s) 753 a), pooling layers (e.g., pooling layer 745a), and unpooling layers (e.g., unpooling layer 755 a). As described inFIGS. 7A and 7B, stacks of convolutional layers may be configured togenerate output feature maps based on input feature maps (or originimages). As described in FIG. 7C, pooling layers may be configured todownsample feature maps, and unpooling layers may be configured toupsample feature maps.

As shown in FIGS. 7D and 7E, neural network 700 may be arranged in ahierarchy of levels of differing spatial resolution. Five such levelsare depicted in FIGS. 7D and 7E, but neural network 700 may includeadditional levels or fewer levels. Each level may have an encoding and adecoding section (which combined over all levels include encoding stage740 and decoding stage 750). Each level of spatial resolution in thehierarchy may be associated with feature maps of a certain spatial size,depending on the spatial size of the input image. With the exception ofthe lowest level, the output of the encoding section for each level maybe provided to the input of the encoding section for the next level. Forexample, the output of convolutional layer(s) 743 a on the highest levelmay be provided to the input of pooling layer 745 a on the next lowerlevel. With the exception of the highest level, the output of thedecoding section may be provided to the input of the decoding sectionfor the next highest level. For example, the output of convolutionallayer(s) 753 d on the second-to-lowest level may be provided to theinput of unpooling layer 755 c on the next higher level.

Furthermore, one or more layers in encoding stage 740 may be directlyconnected to a corresponding layer in decoding stage 750. For example,the final layer in convolutional layer(s) 743 a may output one or morefeature maps to decoding stage 750. These feature map(s) may be combined(e.g., stacked) with the output feature map of unpooling layer 755 b togenerate the input to convolutional layer 753 b. FIGS. 7D and 7Eindicate such direct connections between convolutional layer(s) 743 a toconvolutional layer(s) 743 d and unpooling layer 755 a to unpoolinglayer 755 d, respectively. The provided feature maps are indicated inFIGS. 7D and 7E as the dashed portions of unpooling layer 755 a tounpooling layer 755 d. Direct connections between encoding stage 740 anddecoding stage 750 may enable high resolution features learned inencoding stage 740 to be used in decoding stage 750. This may allow orimprove the ability of decoding stage 750 to generate more accuratesynthetic images. Such direct connections may also improve theflexibility of neural network 700. For example, neural network 700 maybe trained to assign little weight to lower spatial resolution featureswhen higher resolution features provided through the direct connectionsare sufficient to generate accurate synthetic images.

The highest level of neural network 700 may include an input 741,convolutional layer(s) 743 a, unpooling layer 755 a, convolutionallayer(s) 753 a, and output layer 759. In some embodiments, convolutionallayer(s) 743 a, convolutional layer(s) 753 a, and output layer 759 maybe stacks of one or more convolutional layers. For example,convolutional layer(s) 743 a and convolutional layer(s) 753 a may eachinclude two to five convolutional layers. Alternatively, output layer759 may include a single convolutional layer. As discussed with regardsto FIGS. 7A and 7B, these convolutional layers may generate outputfeature maps by applying one or more learnable filter functions to theirrespective input feature maps. For example, the one or moreconvolutional layers comprising convolutional layer(s) 743 a andconvolutional layer(s) 753 a may each apply between 50 and 100 (e.g.,64) learnable filter functions to generate a corresponding number ofoutput feature maps.

The next-lower level of neural network 700 may include a pooling layer745 a, convolutional layer(s) 743 b, unpooling layer 755 b, andconvolutional layer(s) 753 b. In some embodiments, convolutionallayer(s) 743 b and convolutional layer(s) 753 b may be stacks of one ormore convolutional layers. For example, convolutional layer(s) 743 b andconvolutional layer(s) 753 b may each include two to five convolutionallayers. As discussed with regards to FIGS. 7A and 7B, theseconvolutional layers may generate output feature maps by applying one ormore learnable filter functions to their respective input feature maps.For example, the one or more convolutional layers comprisingconvolutional layer(s) 743 b and convolutional layer(s) 753 b may eachapply between 100 and 200 (e.g., 128) learnable filter functions togenerate a corresponding number of output feature maps. Pooling layer745 a may correspond to unpooling layer 755 a in the next-highest level.For example, pooling layer 745 a may be configured to generate a binarymask indicating the locations of maximum values in its input featurelayer, and provide these indications to unpooling layer 755 a. Unpoolinglayer 755 a may be configured to use the binary mask when upsampling itsinput feature map.

The next-lower level of neural network 700 may include a pooling layer745 b, convolutional layer(s) 743 c, unpooling layer 755 c, andconvolutional layer(s) 753 c. In some embodiments, convolutionallayer(s) 743 c and convolutional layer(s) 753 c may be stacks of one ormore convolutional layers. For example, convolutional layer(s) 743 c andconvolutional layer(s) 753 c may each include two to six convolutionallayers. As discussed with regards to FIGS. 7A and 7B, theseconvolutional layers may generate output feature maps by applying one ormore learnable filter functions to their respective input feature maps.For example, the one or more convolutional layers comprisingconvolutional layer(s) 743 c and convolutional layer(s) 753 c may eachapply between 150 and 300 (e.g., 256) learnable filter functions togenerate a corresponding number of output feature maps. Pooling layer745 b may correspond to unpooling layer 755 b in the next-highest level.For example, pooling layer 745 b may be configured to generateindications of the locations of maximum values in its input feature map,and to provide these indications to unpooling layer 755 b. Unpoolinglayer 755 b may be configured to use these indications when upsamplingthe feature map.

The next-lower level of neural network 700 may include a pooling layer745 c, convolutional layer(s) 743 d, unpooling layer 755 d, andconvolutional layer(s) 753 d. In some embodiments, convolutionallayer(s) 743 d and convolutional layer(s) 753 d may be stacks of one ormore convolutional layers. For example, convolutional layer(s) 743 d andconvolutional layer(s) 753 d may each include two to six convolutionallayers. As discussed with regards to FIGS. 7A and 7B, theseconvolutional layers may generate output feature maps by applying one ormore learnable filter functions to their respective input feature maps.For example, the one or more convolutional layers comprisingconvolutional layer(s) 743 d and convolutional layer(s) 753 d may eachapply between 300 and 600 (e.g., 512) learnable filter functions togenerate a corresponding number of output feature maps. Pooling layer745 c may correspond to unpooling layer 755 c in the next-highest level.For example, pooling layer 745 c may be configured to generateindications of the locations of maximum values in its input featurelayer, and to provide these indications to unpooling layer 755 c.Unpooling layer 755 c may be configured to use these indications whenupsampling the feature map.

The lowest level of neural network 700 may include a pooling layer 745d, convolutional layer(s) 743 e, and convolutional layer(s) 753 e. Insome embodiments, convolutional layer(s) 743 e and convolutionallayer(s) 753 e may be stacks of convolutional layers. For example,convolutional layer(s) 743 e and convolutional layer(s) 753 e may eachinclude two to five convolutional layers. As discussed with regards toFIGS. 7A and 7B, these convolutional layers may generate output featuremaps by applying one or more learnable filter functions to theirrespective input feature maps. For example, the one or moreconvolutional layers comprising convolutional layer(s) 743 e andconvolutional layer(s) 753 e may each apply between 800 and 1200 (e.g.,1024) learnable filter functions to generate a corresponding number ofoutput feature maps. Pooling layer 745 d may correspond to unpoolinglayer 755 d in the next-highest level. For example, pooling layer 745 dmay be configured to generate indications of the locations of maximumvalues in its input feature layer, and to provide these indications tounpooling layer 755 d. Unpooling layer 755 d may be configured to usethese indications when upsampling the feature map.

As described with regards to FIGS. 7D and 7E, one or more origin images(e.g., origin image 761) may be provided as input 741 to neural network700. Neural network 700 may be configured to propagate input 741 throughencoding stage 740 and decoding stage 750 to generate one or moredestination images (e.g., destination image 769) as output 751. FIG. 7Fdepicts exemplary feature maps generated by selected layers of neuralnetwork 700 during steps of the encoding and decoding process. Inparticular, FIG. 7F depicts feature maps generated during each step ofencoding stage 740, beginning with origin image 761 and ending withfeature map 763 e. As shown, the relative spatial size of theseexemplary feature maps decreases during the encoding stage (enlargedversions of feature map 763 d and feature map 763 e are provided forease of comprehension). These feature maps also depict the increasingspatial extent of the features in each map, and the decreasing spatialresolution of each map. FIG. 7E also depicts feature maps generatedduring each step of decoding stage 750, beginning with feature map 773 eand ending with destination image 779. As shown, the relative spatialsize of these exemplary feature maps increases during the decoding stage(enlarged versions of feature map 773 e and feature map 773 d areprovided for ease of comprehension). These feature maps also depict thedecreasing spatial extent of the features in each map, and theincreasing spatial resolution of each map.

As a first encoding step, convolutional layer(s) 743 a may be configuredto receive one or more origin images and generate feature maps. Asdescribed above with regard to FIG. 7C, convolutional layer(s) 743 a maycomprise a stack of one or more convolutional layers. In someembodiments, convolutional layer(s) 743 a may be configured to receiveone or more origin images, such as origin image 761, and output featuremaps, such as feature map 763 a. As described above, the disclosedsystems and methods are not limited to images of a particular spatialsize. For example, in some embodiments, the spatial size of the one ormore origin images may be greater than 128×128 pixels, such as 1024×1024pixels. In some embodiments, the one or more origin images may includeat least one T1-weighted MRI image, T2-weighted MRI image, and MRI imagegenerated using a contrast agent. As an additional example, the one ormore origin images may include different slices in a stack of 2D images.In some embodiments, convolutional layer(s) 743 a may be configured tooutput the feature maps to a next-lower level of encoding stage 740.This next-lower level of encoding stage 740 may include pooling layer745 a and convolutional layer(s) 743 b. Convolutional layer(s) 743 a mayalso be configured to output the feature maps directly to decoding stage750.

As a second encoding step, pooling layer 745 a may be configured todownsample the feature maps received from convolutional layer(s) 743 a,generating downsampled feature maps with a smaller spatial size.Convolutional layer(s) 743 b may be configured to receive thesedownsampled feature maps and generate output feature maps, such asfeature map 763 b. As shown, feature map 763 b may have a smallerspatial size than feature map 763 a. In some aspects, the spatial sizeof feature map 763 b may be half the spatial size of feature map 763 ain at least one dimension. In some embodiments, convolutional layer(s)743 b may be configured to output the feature maps to a next-lower levelof encoding stage 740. This next-lower level of encoding stage 740 mayinclude pooling layer 745 b and convolutional layer(s) 743 c.Convolutional layer(s) 743 b may also be configured to provide theoutput feature maps directly to decoding stage 750.

As a third encoding step, pooling layer 745 b may be configured todownsample the feature maps received from convolutional layer(s) 743 b,generating downsampled feature maps with a smaller spatial size.Convolutional layer(s) 743 c may be configured to receive thesedownsampled feature maps and generate output feature maps, such asfeature map 763 c. As shown, feature map 763 c may have a smallerspatial size than feature map 763 b. In some aspects, the spatial sizeof feature map 763 c may be half the spatial size of feature map 763 bin at least one dimension. In some embodiments, convolutional layer(s)743 c may be configured to output the feature maps to a next-lower levelof encoding stage 740. This next-lower level of encoding stage 740 mayinclude pooling layer 745 c and convolutional layer(s) 743 d.Convolutional layer(s) 743 c may also be configured to provide theoutput feature maps directly to decoding stage 750.

As a fourth encoding step, pooling layer 745 c may be configured todownsample the feature maps received from convolutional layer(s) 743 c,generating downsampled feature maps with a smaller spatial size.Convolutional layer(s) 743 d may be configured to receive thesedownsampled feature maps and generate output feature maps, such asfeature map 763 d. As shown, feature map 763 d may have a smallerspatial size than feature map 763 c. In some aspects, the spatial sizeof feature map 763 d may be half the spatial size of feature map 763 cin at least one dimension. In some embodiments, convolutional layer(s)743 d may be configured to output the feature maps to the lowest levelof encoding stage 740. This lowest level of encoding stage 740 mayinclude pooling layer 745 d and convolutional layer(s) 743 e.Convolutional layer(s) 743 d may also be configured to provide theoutput feature maps directly to decoding stage 750.

As a fifth encoding step, pooling layer 745 d may be configured todownsample the feature maps received from convolutional layer(s) 743 d,generating downsampled feature maps with a smaller spatial size.Convolutional layer(s) 743 e may be configured to receive thesedownsampled feature maps and generate output feature maps, such asfeature map 763 e. As shown, feature map 763 e may have a smallerspatial size than feature map 763 d. In some aspects, the spatial sizeof feature map 763 e may be half the spatial size of feature map 763 din at least one dimension. In some embodiments, convolutional layer(s)743 e may be configured to provide these output feature maps to thelowest level of decoding stage 750.

In a first decoding step, convolutional layer(s) 753 e may be configuredto use the feature maps received from convolutional layer(s) 743 e togenerate output feature maps, such as feature map 773 e. In someembodiments, convolutional layer(s) 753 e may be configured to providethese output feature maps to a higher level of decoding stage 750. Thishigher level of decoding stage 750 may include unpooling layer 755 d andconvolutional layer(s) 753 d.

In a second decoding step, unpooling layer 755 d may be configured toupsample the feature maps received from convolutional layer(s) 753 e,generating upsampled feature maps with increased spatial size. Asdescribed above with regard to FIG. 7C, unpooling layer 755 d may usebinary masks generated by pooling layer 745 d to assign values in thefeature maps received from convolutional layer(s) 753 e to correspondingupsampled feature maps. Convolutional layer(s) 753 d may be configuredto use the upsampled feature maps received from unpooling layer 755 d togenerate output feature maps, such as feature map 773 d. In someaspects, convolutional layer(s) 753 d may also use feature maps receivedfrom convolutional layer(s) 743 d to generate the output feature maps.For example, convolutional layer(s) 753 d may be configured to apply oneor more learnable filter functions to a stack of feature maps comprisingthe feature maps received from convolutional layer(s) 743 d and theupsampled feature maps received from unpooling layer 755 d. In someembodiments, the output feature maps generated by convolutional layer(s)753 d may be provided by convolutional layer(s) 753 d to a higher levelof decoding stage 750. This higher level of decoding stage 750 mayinclude unpooling layer 755 c and convolutional layer(s) 753 c.

In a third decoding step, unpooling layer 755 c may be configured toupsample the feature maps received from convolutional layer(s) 753 d,generating upsampled feature maps with increased spatial size. Asdescribed above with regard to FIG. 7C, unpooling layer 755 c may usebinary masks generated by pooling layer 745 c to assign values in thefeature maps received from convolutional layer(s) 753 d to correspondingupsampled feature maps. Convolutional layer(s) 753 c may be configuredto use the upsampled feature maps received from unpooling layer 755 c togenerate output feature maps, such as feature map 773 c. In someaspects, convolutional layer(s) 753 c may also use feature maps receivedfrom convolutional layer(s) 743 c to generate the output feature maps.For example, convolutional layer(s) 753 c may be configured to apply oneor more learnable filter functions to a stack of feature maps comprisingthe feature maps received from convolutional layer(s) 743 c and theupsampled feature maps received from unpooling layer 755 c. In someembodiments, the output feature maps generated by convolutional layer(s)753 c may be provided by convolutional layer(s) 753 c to a higher levelof decoding stage 750. This higher level of decoding stage 750 mayinclude unpooling layer 755 b and convolutional layer(s) 753 b.

In a fourth decoding step, unpooling layer 755 b may be configured toupsample the feature maps received from convolutional layer(s) 753 c,generating upsampled feature maps with increased spatial size. Asdescribed above with regard to FIG. 7C, unpooling layer 755 b may usebinary masks generated by pooling layer 745 b to assign values in thefeature maps received from convolutional layer(s) 753 c to correspondingupsampled feature maps. Convolutional layer(s) 753 b may be configuredto use the upsampled feature maps received from unpooling layer 755 b togenerate output feature maps, such as feature map 773 b. In someaspects, convolutional layer(s) 753 b may also use feature maps receivedfrom convolutional layer(s) 743 b to generate the output feature maps.For example, convolutional layer(s) 753 b may be configured to apply oneor more learnable filter functions to a stack of feature maps comprisingthe feature maps received from convolutional layer(s) 743 b and theupsampled feature maps received from unpooling layer 755 b. In someembodiments, the output feature maps generated by convolutional layer(s)753 b may be provided by convolutional layer(s) 753 b to the highestlevel of decoding stage 750. This highest level of decoding stage 750may include unpooling layer 755 a, convolutional layer(s) 753 a, andoutput layer 759.

In a fifth decoding step, unpooling layer 755 a may be configured toupsample the feature maps received from convolutional layer(s) 753 b,generating upsampled feature maps with increased spatial size. Asdescribed above with regard to FIG. 7C, unpooling layer 755 a may usebinary masks generated by pooling layer 745 a to assign values in thefeature maps received from convolutional layer(s) 753 b to correspondingupsampled feature maps. Convolutional layer(s) 753 a may be configuredto use the upsampled feature maps received from unpooling layer 755 a togenerate output feature maps, such as feature map 773 a. In someaspects, convolutional layer(s) 753 a may also use feature maps receivedfrom convolutional layer(s) 743 a to generate the output feature maps.For example, convolutional layer(s) 753 a may be configured to apply oneor more learnable filter functions to a stack of feature maps comprisingthe feature maps received from convolutional layer(s) 743 a and theupsampled feature maps received from unpooling layer 755 a. In someembodiments, output layer 759 may be configured to use the outputfeature maps received from convolutional layer(s) 753 a to generate atleast one destination image, such as destination image 779. In someembodiments, the at least one destination image may be provided tooutput 751.

In the exemplary embodiment shown in FIGS. 7D and 7E, neural network 700may include 27 convolutional layers, with over 30 million parameters forconverting a three channel image. This exemplary arrangement is notintended to be limiting. For example, the number of layers in neuralnetwork 700 between encoding stage 740 and decoding stage 750 may rangefrom about 10 to a few hundred (e.g., 500). As the number of layersincreases, the number of model parameters also increases, providingbetter accuracy in prediction. However, a large number of layers mayalso result in increased computational cost, especially during thetraining stage. The suitable number of layers may change as thecomputational power of processors advances. Similarly, neural network700 may include more or fewer levels of spatial resolution. In someembodiments, a different number of input channels may be used or adifferent image size may be used.

FIG. 8 is a flowchart illustrating an exemplary training process 800 fortraining neural network 700, consistent with disclosed embodiments. Insome embodiments, training process 800 may be performed by training unit430, described above, to learn model parameters θ={W₁, b₁, W₂, b₂, . . .} of neural network 700. Training process 800 may start when trainingunit 430 receives a set of training images (Step 810). As describedabove, the set of training images may include one or more origin imagesacquired using a first imaging modality and corresponding destinationimages acquired using a second imaging modality. For example, thetraining images may include single- or multi-channel MRI images of ananatomical area of a patient, such as a head, torso, abdomen, and/orlimbs. In some embodiments, the corresponding destination images may beacquired by a CT scan of the same or similar anatomical area of thepatient.

As shown in FIG. 8, training unit 430 may initialize the predictivemodel in step 812. For example, training unit 430 may initialize aniteration index for training the CNN model, e.g., to zero. In someembodiments, training unit 430 may also initialize the parameters ofneural network 700. For example, convolutional layer weights may beinitialized to random values, and/or convolutional layer biases may beinitialized to zero. In some embodiments, training unit 430 mayinitialize the parameters of encoding stage 730 using parameters frompre-trained models. For example, training unit 430 may borrow weightsfrom models trained for other tasks, such as image classification. Thismay enable image conversion system 400 to exploit the transfer learningcharacteristics of neural networks to speed up convergence on anacceptable solution.

Steps 814 through 822 may be performed iteratively until one or morestopping criteria are met (e.g., the iterative process convergesaccording to predefined criteria). At each iteration, a new batch oftraining data may be randomly selected and used for training the CNNmodel until the maximum number of iterations is reached. In someembodiments, each batch may include 30-60 sets of training image. Forexample, each batch may include approximately 50 pairs of origin imagesand destination images.

In step 814, training unit 430 may be configured to select a batch oftraining data. In some embodiments, this training data may be selectedrandomly from the training data stored in database 410. Training unit430 may be optionally configured to pre-process the training data. Forexample, training unit 430 may be configured to receive one or moretraining images 412 from database 410 and process them to remove certainimage features, consistent with process 600 described above. In step816, training unit 430 may be configured to calculate a synthetic imageaccording to the current parameters of neural network 700 for each setof training images. In step 818, training unit 430 may be configured togenerate an error map (e.g., as described above with respect to FIG. 2)based on the calculated synthetic image and the training data. Forexample, when the training data includes an origin image and acorresponding destination image, training unit 430 may be configured togenerate an error map as a difference between the synthetic image andthe destination image.

In step 820, training unit 430 may be configured to determine or updateparameters of neural network 700 based on the error map generated bystep 818. For example, backpropagation algorithms known to those ofskill in the art may be used to determine or update parameters of neuralnetwork 700. In some embodiments, the backpropagation algorithm may beconfigured to minimize a loss function with respect to the parameters ofneural network 700. For example, the backpropagation algorithm mayupdate model parameters θ=(θ₁, θ₂, . . . , θ_(L)) to reduce the value ofthe loss function. One of skill in the art would be familiar withvarious loss functions that training unit 430 may implement, such as themean absolute error (MAE) of the model prediction, which may bedetermined as follows:

$\begin{matrix}{{E(\theta)} = {\frac{1}{n}{\sum\limits_{i = 1}^{n}\;{{{y_{i}(\theta)} - z_{i}}}}}} & {{Eq}.\mspace{14mu} 2}\end{matrix}$

where y_(i)(θ) indicates the calculated CT intensity value at each voxelof the synthetic image, z_(i) denotes the CT intensity value at eachvoxel of the destination image (e.g., the true value), n is the totalnumber of image voxels, and E(θ) represents the MAE. Using the MAE asthe loss function may improve learning, making the learning processrobust to outliers such as noise and artifacts in the training data.Alternatively, training unit 430 may be configured to use the meansquared error (MSE) of the model prediction, which may be determined asfollows:

$\begin{matrix}{{E(\theta)} = {\frac{1}{n}{\sum\limits_{i = 1}^{n}{{{y_{i}(\theta)} - z_{i}}}^{2}}}} & {{Eq}.\mspace{14mu} 3}\end{matrix}$

Here, E(θ) represents the MSE.

The backpropagation algorithm may be used to compute the gradient of theloss function with respect to the model parameters (θ), such as theweights W_(k) of the convolution filters and offset values b. The modelparameters may then be updated iteratively by using a stochasticgradient descent algorithm. In some embodiments, batch normalization maybe performed after each convolutional layer to reduce internal covariantshift.

Simple data augmentation may also be performed to artificially increasethe number of training data during model training. In such dataaugmentation, new pairs of origin and destination images may be createdthrough a random translation of pixels or flipping of images in eachspatial dimension for each pair of MR and CT images. By using such dataaugmentation, exemplary embodiments consistent with the disclosure mayincrease the amount of training data and thus generate a more highlytrained prediction model within training unit 430, leading to moreaccurate synthetic images generated by the model.

In step 822, training unit 430 may be configured to determine whetherstopping criteria have been satisfied. Various stopping criteria may beused, such as a predetermined maximum number of iterations or apredetermined image quality measure (e.g., a loss function valueindicating that the difference between the estimated synthetic image andthe destination image is sufficiently small). For example, training unit430 may be configured to determine whether an index is equal to orgreater than the predetermined maximum number of iterations.Additionally or alternatively, training unit 430 may be configured todetermine whether the accuracy of the synthetic image meets or exceedsan expected accuracy, e.g., by determining whether the loss function issmaller than certain threshold criteria. If training unit 430 determinesthat the stopping criteria have been not satisfied, the training process800 may return to step 814. If training unit 430 determines that thestopping criteria has been satisfied, the training process 800 mayproceed to step 824. In step 824, training unit 430 may be configured tostore the updated version of neural network 700 for later use by imageconversion unit 440.

FIG. 9A is a flowchart illustrating an exemplary image conversionprocess 900A using a trained convolutional neural network (e.g., neuralnetwork 700) obtained through process 800 of FIG. 8, according to someembodiments of the present disclosure. Image conversion process 900A maybe performed by image conversion unit 440. In an embodiment, imageconversion unit 440 may be configured to generate synthetic images fromorigin images acquired using the same imaging modality as the trainingimages. For example, when neural network 700 is trained on MRI images,the image conversion unit 440 may be configured to generate syntheticimages based on MRI images. Similarly, when neural network 700 istrained on single-channel MRI images, image conversion unit 440 may beconfigured to use the same type of single-channel MRI images (e.g.,T1-weighted images). When neural network 700 is trained on multi-channelMRI images, image conversion unit 440 may be configured to use the sametype of multi-channel MRI images.

As shown in FIG. 9A, process 900A may start when image conversion unit440 receives one or more origin image in step 912. In some aspects,image conversion unit 440 may be configured to receive one or moreorigin images from image database 420, another component of imageconversion system 400, or another system. In step 914, image conversionunit 440 may be optionally configured to preprocess the one or moreorigin images, consistent with process 600 described above, to removeartifacts associated with the origin images in order to increase theaccuracy of the conversion process.

Image conversion unit 440 may be configured to receive neural network700 in step 918. In some aspects, neural network 700 may be receivedfrom training unit 430, another component of image conversion system400, or another system. Image conversion unit 440 may be configured tocalculate one or more synthetic images using neural network 700 in step920. Image conversion unit 440 may then output, in step 922, the one ormore synthetic images. Outputting may include displaying arepresentation of one or more synthetic images for further analysis orobservation, storing the one or more synthetic images in anon-transitory medium, or providing the one or more synthetic images toa computer process, program, and/or application. The non-transitorymedium and computer process, program, and/or application may be on aremote system.

Process 900A may be performed to convert both 2D and 3D origin images.In case of converting 3D images, neural network 700 may be modified togenerate 3D synthetic images. As described above, input 731 of neuralnetwork 700 may be configured to accept a stack of adjacent 2D imagesselected from the 3D origin image. Similarly, output layer 749 of neuralnetwork 700 may be configured to generate a corresponding number ofadjacent synthetic images, which can be assembled into a 3D syntheticimage. In some embodiments, a 3D origin image may be divided into stacksof 2D origin images. In some embodiments, the stacks of origin imagesmay overlap. For example, the first stack may include the first throughthird image slices, the second stack may include the second throughfourth image slices, and the third stack may include the third throughfifth image slices. Image conversion unit 440 may then be configured toconvert these stacks of origin images to corresponding stacks ofsynthetic images. For example, when the stack of images include threeimages, image conversion unit 440 may be configured to also generatethree synthetic images corresponding to each of three origin images inthe stack. The stacks of synthetic images may also overlap, consistentwith the stack of origin images. As a result, for a given slice in the3D synthetic images, multiple 2D synthetic images may be generated.These multiple synthetic images may be fused to derive one syntheticimage for the slice. Various rules may be used for fusion. For example,the multiple images may be averaged, or the median values may be used,etc.

FIG. 9B depicts an exemplary flowchart illustrating a 3D imageconversion process 900B, consistent with disclosed embodiments. In someembodiments, process 900B may be performed by image conversion unit 440.Process 900B begins in step 928 by having image conversion unit 440configured to receive one or more 3D origin images. The 3D images may bereceived from image database 420 or from another component of imageconversion system 400. The 3D images may include images of a head,torso, abdomen, limbs, or any another anatomical area, either in part orin whole, of a patient.

In step 930, image conversion unit 440 may be configured to select ananatomical plane of a received 3D origin image. In step 932, imageconversion unit 440 may be configured to generate one or more stacks of2D origin images based on the received 3D image and the selected plane.For example, image conversion unit 440 may be configured to select amongthe three anatomical planes, including the axial plane, sagittal plane,and coronal plane. Image conversion unit 440 may be configured togenerate 2D origin images from the received 3D origin image along theselected anatomical plane. In some aspects, image conversion unit 440may be configured to generate a sequence of individual origin images, ora sequence of stacks of origin images.

In step 934, image conversion unit 440 may be configured to receive aneural network (e.g., neural network 700). The received neural networkmay correspond to the selected anatomical plane. For example, thereceived neural network may have been trained with images (or stacks ofimages) along the selected anatomical plane. Such images (or stacks ofimages) along the selected anatomical plane may collectively comprisedependent structure information for an axis orthogonal to the selectedanatomical plane. Because the received neural network has been trainedusing images along the selected anatomical plane, this neural networkmay be capable of using this dependent structure information to improveconversion efficiency. For example, the parameters or weights of thereceived neural network may reflect this dependent structureinformation. In response to a change in the selected anatomical plane(e.g., sagittal to coronal), image conversion unit 440 may be configuredto receive another neural network corresponding to the newly selectedanatomical plane. For example, image conversion unit 440 may receive anaxial neural network, a coronal neural network, or a sagittal neuralnetwork, according to the selected plane. Just as the previouslyreceived neural network was capable of using dependent structureinformation for an axis orthogonal to the previously selected anatomicalplane to improve conversion efficiency, the newly received neuralnetwork may be capable of using dependent structure information for anaxis orthogonal to the newly selected anatomical plane to improveconversion efficiency. In some embodiments, image conversion unit 440may be configured to receive a neural network trained with images (orstacks of images) along multiple planes (e.g., multiple anatomicalplanes, such that the stack may include a coronal images, sagittalimages; or the stack may include axial images and a sagittal images,etc.). Accordingly, image conversion unit 440 may be configured to usethis neural network to generate synthetic images from origin imagesalong any selected anatomical plane or multiple anatomical planes.

In step 936, image conversion unit 440 may be configured to generatecorresponding synthetic images, or stacks of synthetic images using thereceived neural network. As described above, conversion efficiency maybe improved using neural networks trained using images (or stacks ofimages) along a particular anatomical plane, as these networks may usedependent structure information specific to that particular anatomicalplane. Image conversion unit 440 may repeat step 936 until all selectedorigin images, or selected stacks of origin images have been convertedto synthetic images.

In some embodiments, image conversion unit 440 may perform step 930after step 934. For instance, the trained neural network received byimage conversion unit 440 may be trained using 2D origin images alongmultiple planes (e.g., all three planes). In such a case, imageconversion unit 440 may be configured to iteratively select each of thedifferent planes and generate 2D origin images along the selected plane.These 2D origin images along the three planes may be independently inputto the neural network 700 to generate synthetic images along therespective planes. For example, axial 2D origin images may be input intothe neural network 700, which outputs axial synthetic images. Sagittal2D origin images may be input into the neural network 700, resulting insagittal synthetic images. And finally, coronal 2D origin images may beinput into the neural network 700 resulting in coronal synthetic images.Alternatively, the 2D origin images of all three planes may be combinedinto a single sequence (e.g., axial 2D origin images followed by thesagittal 2D origin images, then followed by the coronal 2D originimages), and input to the same neural network 700, which is trainedusing images stacked from the three-planes.

In step 938, image conversion unit 440 may be configured to aggregatethe resulting 2D synthetic images into a 3D synthetic image. Suchaggregation may include stacking the synthetic images along an axisorthogonal to the selected plane, to obtain a 3D synthetic image.

In step 940, image conversion unit 440 may determine whether all planesof the 3D origin image have been processed. For example, imageconversion unit 440 may continue to process each plane until apredefined 3D image volume has been processed. Alternatively, imageconversion unit 440 may continue to process until a predefined number of2D planar images have been processed. If all the planes have beenprocessed based on the criteria used in step 940, then process 900Bcontinues to step 942. If the planes have not been processed, imageconversion unit 440 may be configured to return to step 930 and selectanother image plane for processing.

Because the 3D synthetic image of each plane provides a pixel value toany given pixel in the 3D image, every pixel may have three differentpixel values. The 3D synthetic images, therefore, may be combined todetermine the final 3D synthetic image. In step 942, image conversionunit 440 may be configured to determine a final 3D synthetic image.Image conversion unit 440 may be configured to determine the final 3Dsynthetic image by combining (e.g., as known in the art as “fusing”) the3D synthetic images corresponding to each selected plane (e.g., theaxial, sagittal, and coronal planes). In some embodiments, the value ofa voxel of the fused 3D synthetic image may be an average value. Forexample, if the three fused 3D synthetic images were to have CT valuesof 70, 80, and 30, respectively, then the voxel may then have an averageCT value of 60. As another example, the median voxel value, e.g., 70 inthe above example, may be used. Other methods of combining the values ofthe voxels may be used, and the above example is not intended to belimiting. In this manner, the dependent structure information specificto each anatomical plane may be incorporated into the final 3D syntheticimage.

Various operations or functions are described herein, which may beimplemented or defined as software code or instructions. Such contentmay be directly executable (“object” or “executable” form), source code,or difference code (“delta” or “patch” code). Software implementationsof the embodiments described herein may be provided via an article ofmanufacture with the code or instructions stored thereon, or via amethod of operating a communication interface to send data via thecommunication interface.

A machine or computer-readable storage medium may include one or morenon-transitory media (e.g., a centralized or distributed database,and/or associated caches and servers). Such a machine orcomputer-readable storage medium may store computer-executableinstructions or data that may cause a machine to perform the functionsor operations described. Such a machine or computer-readable storagemedium may include any mechanism that stores information in a formaccessible by a machine (e.g., computing device, electronic system, andthe like), such as recordable/non-recordable medium (e.g., read onlymemory (ROM), random access memory (RAM), magnetic disk storage medium,optical storage medium, flash memory devices, and the like). Forexample, the term “machine-readable storage medium” or“computer-readable storage medium” shall accordingly be taken toinclude, but not be limited to, solid-state memories, optical, andmagnetic medium.

A communication interface may include any mechanism that interfaces toany of a hardwired, wireless, optical, and the like, medium tocommunicate to another device, such as a memory bus interface, aprocessor bus interface, an Internet connection, a disk controller, andthe like. The communication interface can be configured by providingconfiguration parameters and/or sending signals to prepare thecommunication interface to provide a data signal describing the softwarecontent. The communication interface can be accessed via one or morecommands or signals sent to the communication interface.

What is claimed is:
 1. A computer-implemented method for generating asynthetic image of an anatomical portion based on an origin image of theanatomical portion acquired by an imaging device using a first imagingmodality, the method comprising: receiving the origin image of theanatomical portion acquired by the imaging device using the firstimaging modality, wherein the origin image includes a three-dimensionalimage; obtaining a plurality of stacks of two-dimensional images fromthe three-dimensional image; receiving a convolutional neural networkmodel trained for generating a synthetic image based on origin imagedata; and generating, by at least one processor; the synthetic imagefrom the origin image through use of the convolutional neural networkmodel, the synthetic image resembling a second imaging modalitydepiction of the anatomical portion, wherein the second imaging modalityprovides imaging data that differs in format from imaging data providedby the first imaging modality, and wherein generating the syntheticimage includes: generating a plurality of stacks of synthetictwo-dimensional images from the plurality of stacks of two-dimensionalimages, using the convolutional neural network model; and determiningthe synthetic image by aggregating the plurality of stacks of synthetictwo-dimensional images.
 2. The method of claim 1, wherein the firstimaging modality is Magnetic Resonance Imaging, and the second imagingmodality is Computed Tomography.
 3. The method of claim 1, wherein thefirst imaging modality and the second imaging modality are selected fromthe group of Magnetic Resonance Imaging, Computed Tomography, ultrasoundimaging, Positron Emission Tomography, and Single-Photon EmissionComputed Tomography.
 4. The method of claim 1, further comprising:receiving a plurality of training origin images acquired using the firstimaging modality; receiving a plurality of training destination imagesacquired using the second imaging modality, each training destinationimage corresponding to a training origin image; determining aconvolutional neural network architecture; and training theconvolutional neural network model using the training origin images andcorresponding training destination images.
 5. The method of claim 1,wherein the synthetic image includes a stack of two-dimensional images.6. The method of claim 1, wherein the origin image is provided in athree-dimensional volume and the synthetic image is provided in athree-dimensional volume.
 7. The method of claim 1, wherein obtainingthe plurality of stacks of two-dimensional images comprises selecting aplurality of stacks of adjacent two-dimensional images from thethree-dimensional image; and wherein generating the plurality of stacksof synthetic two-dimensional images comprises generating respectivestacks of synthetic two-dimensional images from the plurality of stacksof adjacent two-dimensional images using the convolutional neuralnetwork model.
 8. The method of claim 7, wherein obtaining the pluralityof stacks of two-dimensional images comprises creating a first stack oftwo-dimensional images from a first plane of the three-dimensional imageand a second stack of two-dimensional images from a second plane of thethree-dimensional image; and wherein generating the plurality of stacksof synthetic two-dimensional images comprises generating a first stackand a second stack of synthetic two-dimensional images from the firststack and the second stack of two-dimensional images using theconvolutional neural network model.
 9. The method of claim 7, whereinthe origin image includes multi-channel images of the anatomical portionacquired using different acquisition settings or using differentacquisition channels of the imaging device.
 10. The method of claim 4,wherein training the convolutional neural network model furtherincludes: generating synthetic images from the training origin imagesusing the convolutional neural network model; determining a differencebetween the synthetic images and the corresponding training destinationimages; and updating model parameters of the convolutional neuralnetwork model based on the difference.
 11. The method of claim 10,wherein the training is completed when a difference between a respectivesynthetic image and a respective destination image is smaller than apredetermined threshold.
 12. The method of claim 10, wherein thedifference is measured by a loss function calculated based on thesynthetic image and the destination image.
 13. A system for generating asynthetic image of an anatomical portion based on an origin image of theanatomical portion acquired by an imaging device using a first imagingmodality, the system comprising: an input interface configured to:receive the origin image of the anatomical portion acquired by theimaging device using the first imaging modality, wherein the originimage includes a three-dimensional image, and wherein thethree-dimensional image provides a plurality of stacks oftwo-dimensional images; and receive a convolutional neural network modeltrained for generating a synthetic image based on origin image data; andat least one storage device configured to store the origin image and theconvolutional neural network model; and an image processor configured togenerate the synthetic image from the origin image through use of theconvolutional neural network model, the synthetic image resembling asecond imaging modality depiction of the anatomical portion, wherein thesecond imaging modality provides imaging data that differs in formatfrom imaging data provided by the first imaging modality, and whereinthe image processor is configured to generate the synthetic image withoperations to: generate a plurality of stacks of synthetictwo-dimensional images from the plurality of stacks of two-dimensionalimages, using the convolutional neural network model; and determine thesynthetic image by aggregating the plurality of stacks of syntheticdimensional images.
 14. The system of claim 13, wherein the origin imageincludes multi-channel images of the anatomical portion acquired usingdifferent acquisition settings or using different acquisition channelsof the imaging device.
 15. The system of claim 13, wherein the firstimaging modality is Magnetic Resonance Imaging, and the second imagingmodality is Computed Tomography.
 16. The system of claim 13, wherein thefirst imaging modality and the second imaging modality is selected fromamong the group of Magnetic Resonance Imaging, Computed Tomography,ultrasound imaging, Positron Emission Tomography, and Single-PhotonEmission Computed Tomography.
 17. The system of claim 13, wherein theinput interface is further configured to: receive a plurality oftraining origin images acquired using the first imaging modality;receive a plurality of training destination images acquired using thesecond imaging modality, each training destination image correspondingto a training origin image; and wherein the image processor is furtherconfigured to: train the convolutional neural network model using thetraining origin images and corresponding training destination images.18. The system of claim 13, wherein the synthetic image includes a stackof two-dimensional images.
 19. The system of claim 13, wherein theorigin image is provided in a three-dimensional volume and the syntheticimage is provided in a three-dimensional volume.
 20. The system of claim13, wherein the image processor is further configured to: select aplurality of stacks of adjacent two-dimensional images from thethree-dimensional image; and generate the plurality of stacks ofsynthetic two-dimensional images by generating respective stacks ofsynthetic two-dimensional images from the plurality of stacks ofadjacent two-dimensional images using the convolutional neural networkmodel.
 21. The system of claim 13, wherein the image processor isfurther configured to: create a first stack of two-dimensional imagesfrom a first plane of the three-dimensional image and a second stack oftwo-dimensional images from a second plane of the three-dimensionalimage; and generate the plurality of stacks of synthetic two-dimensionalimages by generating a first stack and a second stack of synthetictwo-dimensional images from the first stack and the second stack oftwo-dimensional images using the convolutional neural network model. 22.A non-transitory computer-readable medium comprising instructions that,when executed by at least one processor, cause the at least oneprocessor to generate a synthetic image of an anatomical portion basedon an origin image of the anatomical portion acquired by an imagingdevice using a first imaging modality, with operations comprising:receiving the origin image of the anatomical portion acquired by theimaging device using the first imaging modality, wherein the originimage includes a three-dimensional image; obtaining a plurality ofstacks of two-dimensional images from the three-dimensional image;receiving a convolutional neural network model trained for generating asynthetic image based on the origin image; and generating the syntheticimage from the origin image through use of the convolutional neuralnetwork model, the synthetic image resembling a second imaging modalitydepiction of the anatomical portion, wherein the second imaging modalityprovides imaging data that differs in format from imaging data providedby the first imaging modality, and wherein generating the syntheticimage includes: generating a plurality of stacks of synthetictwo-dimensional mages from the plurality of stacks of two-dimensionalimages, using the convolutional neural network model; and determiningthe synthetic image by aggregating the plurality of stacks of synthetictwo-dimensional images.
 23. The non-transitory computer-readable mediumof claim 22, the operations further comprising: receiving a plurality oftraining origin images acquired using the first imaging modality;receiving a plurality of training destination images acquired using thesecond imaging modality, each training destination image correspondingto a training origin image; determining a convolutional neural networkarchitecture; and training the convolutional neural network model usingthe training origin images and corresponding training destinationimages.
 24. The non-transitory computer-readable medium of claim 22,wherein obtaining the plurality of stacks of two-dimensional imagescomprises selecting a plurality of stacks of adjacent two-dimensionalimages from the three-dimensional image; and wherein generating theplurality of stacks of synthetic two-dimensional images comprisesgenerating respective stacks of synthetic two-dimensional images fromthe plurality of stacks of adjacent two-dimensional images using theconvolutional neural network model.
 25. The non-transitorycomputer-readable medium of claim 22, wherein obtaining the plurality ofstacks of two-dimensional images comprises creating a first stack oftwo-dimensional images from a first plane of the three-dimensional imageand a second stack of two-dimensional images from a second plane of thethree-dimensional image; and wherein generating the plurality of stacksof synthetic two-dimensional images comprises generating a first stackand a second stack of synthetic two-dimensional images from the firststack and the second stack of two-dimensional images using theconvolutional neural network model.